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Preface 


K. I. Kellermann! 


National Radio Astronomy Observatory 
520 Edgemont Rd., Charlottesville, VA 22901, USA 


kkellerm@nrao. edu 


Starting with the discovery of cosmic radio emission by Karl Jansky (1933),! radio 
astronomers have made a series of remarkable discoveries that have fundamen- 
tally changed our understanding of the Universe and its constituents. Radio galax- 
ies, quasars, pulsars, gravitational lensing, interstellar molecules and masers, radio 
recombination lines, the cosmic microwave background, the first extra solar plan- 
ets, solar radio bursts, and violent electrical storms on Jupiter were all unknown 
before they were discovered as a result of radio observations, mostly by accident or 
serendipitously.*:? Moreover, radio and radar astronomy provided the first evidence 
for gravitational radiation,’ the rotation of Mercury,° the hothouse effect on Venus,® 
superluminal motion,’ and cosmic evolution;® gave the most precise tests of general 
relativistic “light” bending;? and led to the fourth test of general relativity.1° 

These accomplishments were made possible mostly as a result of the tremendous 
progress in electronic and radio technology as well as the dramatic developments 
in computer technology, during the second half of the 20th century. During this 
period, the sensitivity, resolution, imaging quality, frequency coverage, spectroscopic 
capability, and time resolution of radio telescopes all improved by many orders of 
magnitude and the discoveries followed. For example, the NRAO Very Large Array 
is able to detect radio sources about 10/7 times weaker than Karl Jansky’s 1933 
antenna, corresponding to an improvement of about a factor of 30 per decade of 
time. 


1The National Radio Astronomy Observatory is operated by Associated Universities Inc. under 
Cooperative Agreement with the National Science Foundation. 
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As described by Salter in his chapter on Single-Dish Radio Telescopes, the 
improvements in sensitivity were achieved in part by building larger and more ver- 
satile filled-aperture collecting areas. In Chapter 4, Cortes-medellin discusses the 
improvements that have come from the development of more sophisticated antenna 
feeds and feed systems with larger bandwidths and illumination optimized for better 
resolution and minimum sidelobes, as well as focal plane arrays (FPA) to enhance 
the limited field of view of single pixel feeds. 

Complementing the progress in antenna technology are the dramatic advances 
in receiver technology and in digital signal processing outlined in Chapter 5 by 
Fisher and Morgan and in Chapters 7 and 8 on Backends by Price and Lorimer, 
respectively. The performance of low noise amplifiers has reached the point that 
throughout the radio spectrum, sensitivity is no longer limited by amplifier noise, 
but by tropospheric noise at millimeter wavelengths; the cosmic microwave back- 
ground and ground spillover at centimeter wavelengths; and by galactic noise at 
meter wavelengths and longer. The dramatic reductions in receiver noise have been 
complemented by, arguably, even more impressive advances in backend digital signal 
processing. The first spectral line in radio astronomy was found in 1951 by Ewen 
and Purcell! 
band radiometer. Later, spectrometers with as many as 100 analog filterbanks were 
developed, but they were soon replaced by digital autocorrelation spectrometers 
pioneered by Sandy Weinreb as part of his MIT Ph.D. thesis.!? Today, as described 
in Chapter 7 by Price, thousands of atomic and molecular lines are being studied 
with radio telescopes, such as the JVLA and ALMA, that support more than 10,000 
separate frequency channels using sophisticated digital spectrometers. The first pul- 


using a simple horn and a radiometer with a tunable single narrow- 


sars were found from inspecting analog chart recordings and noting periodic pulses. 
Lorimer (Chapter 8) discuses modern digital techniques that are able to average 
over many pulses by searching over a range of pulse periods and dispersions to 
greatly enhance the sensitivity of pulsar searches and precision measurements of 
their pulsation timing. 

But nowhere has the progress in radio astronomy instrumentation been more 
dramatic than in angular resolution and imaging. For many years it was widely 
assumed that as a result of the long wavelengths involved, the angular resolution 
of radio telescopes would always be limited compared to that of optical telescopes. 
Since radio wavelengths are longer than optical wavelengths by a factor of about 
10°, to obtain equivalent resolution to a 10-m optical telescope, a radio telescope 
needs to have dimensions about 1000 km in extent. Interestingly, starting with the 
early development of interferometry and earth rotation synthesis in the 1950s and 
1960s, the resolution of radio telescopes has improved by a factor of about 107 to 
routinely reach values better than one milliarcsecond. 

There are three reasons why radio telescopes have achieved such a remarkable 
resolution advantage over optical telescopes. First, because of the long wavelengths 
involved, radio telescopes do not need the same precision as optical telescopes to 
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achieve diffraction-limited performance. Second, radio signals, unlike light waves, 
can be amplified and split without loss of sensitivity. As described by Fomalont 
and Wright in Chapter 2, this allows the use of multi-element interferometers and 
arrays to achieve high angular resolution. Finally, in practice, optical telescopes 
have been limited, not by diffraction, but by seeing due to turbulence in the earth’s 
troposphere. Typically, on a good mountain site, under good conditions “seeing” 
images are rarely better than about 0.5 arcseconds, although the use of “guide stars” 
or laser-driven artificial targets allow adaptive optics systems working in the near 
infrared to achieve nearly diffraction-limited performance over a very limited field of 
view. And, of course, telescopes in space can give diffraction-limited images. Radio 
astronomers, by contrast, are able to implement adaptive optics through off-line 
image processing, exploiting amplitude and phase closure relations, which, when 
combined with CLEAN or maximum likelihood deconvolution techniques, achieve 
diffraction-limited radio images with resolution superior to that of any optical or 
infrared telescope, in space or on the ground. 

Karl Jansky’s pioneering radio telescope could barely distinguish one part of the 
sky from another. Modern large filled-aperture radio telescopes operating at their 
shortest wavelengths are able to obtain angular resolutions of about one arcminute, 
about that of the unaided human eye. Large connected-element ground-based arrays 
such as the Very Large Array or ALMA have resolutions a bit better than an arc- 
second, about equal to that of the best optical telescopes on a good mountain site. 
Radio-linked arrays such as multi-telescope-radio-linked interferometer (MTRLI) 
in the UK!% achieve another factor of ten in resolution, while very long baseline 
interferometry (VLBI) reaches milliarcsecond resolutions. The Russian space VLBI 
mission, RadioAstron,!* extends interferometer baselines to beyond the Moon, with 
corresponding resolution of only 10 microarcseconds at the shortest operating wave- 
length of 1.3 cm. In Chapter 2, Fomalont and Wright review the sophisticated 
techniques that radio astronomers use to achieve this extraordinary resolution and 
image quality. 

Low frequencies (wavelengths greater than about 1 meter) present special prob- 
lems as well as new opportunities. Although radio astronomy started at long wave- 
lengths, the limited resolution thwarted further progress until, as described by 
Ellingson and Taylor in Chapter 3, the recent advances in digital signal processing 
made it possible to build long wavelength arrays with large fields of view and with 
arcsecond angular resolution. 

Cosmic radio emission is polarized, from less than one percent in radio galaxies 
and quasars to many tens of percent from pulsars and radio bursts from the Sun 
and Jupiter. Observations of both linear and circular radio polarization not only 
give insight to the organization of magnetic fields that generate nonthermal radio 
emission, but also help interpret the thermal radio emission from the surfaces of the 
Moon and planets. Robishaw and Heiles review the sometimes inconsistent defini- 
tions used to describe radio polarization measurements in Chapter 6. They discuss 
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the many subtle sources of instrumental polarization which need to be mitigated 


in order to extract the sometimes extraordinarily small true polarization signal, 


such as that imprinted on the cosmic microwave background by gravitational waves 
generated during the brief period of cosmic inflation. 
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Chapter 1 


Single-Dish Radio Telescopes 


Christopher J. Salter 


National Astronomy and Ionosphere Center, Arecibo Observatory 
HC 3 Box 53995, Arecibo, Puerto Rico 00612, USA 


csalter@naic.edu 


Single-dish radio telescopes are presented in their many shapes and sizes. Those 
properties relevant to their observing utility are discussed, as is the relation- 
ship between the “true” distribution of celestial brightness and that recorded 
as the output from a single dish. The appendix to this chapter presents a com- 
pendium of information on the majority of the single-dish radio telescopes of the 
world. 


1. Introduction 


The compound adjective “single-dish” in this chapter’s title usually conjures up 
an image of a huge fully-steerable parabolic reflector, such as those located at 
Jodrell Bank (UK), Effelsberg (Germany), or Green Bank (WV, USA). While these 
are high-profile members of the single-dish family, I will also include under this 
umbrella all “single-element” antennas that respond to Fourier components of the 
celestial brightness distribution down to zero spatial frequency (i.e. 0 cycles/rad). 
This includes such antennas as broadside arrays, horn antennas, “trough” reflectors, 
and dishes that are firmly anchored to the surface of the earth. 

It might be considered odd to begin this volume by considering single-dish 
radio telescopes, when their more costly brethren, the large synthesis arrays, pro- 
duce images of immensely greater angular resolution, not only allowing investigation 
of the finest scale radio-source structure, but also avoiding the problem of source 
confusion that can bedevil some single-dish studies. (Here, “confusion” means sky 
brightness fluctuations due to faint radio sources blending together in the tele- 
scope beam and setting the ultimate ability of a telescope to detect faint objects.') 
However, not only do single dishes have a heroic history in radio astronomy, but they 
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continue to make unique scientific contributions in situations where the use of arrays 
would be either “uneconomical”, or quasi-impossible. Nobel prizes in physics have 
been awarded for discoveries made all, or in large part, by single-dish telescopes, i.e. 
in 1974 (Ryle and Hewish), 1978 (Penzias and Wilson), 1993 (Taylor and Hulse), 
and 2006 (Mather and Smoot). Indeed, single-dish telescopes have played a funda- 
mental role in the story of radio astronomical discovery, from the antennas of Karl 
Jansky and Grote Reber that launched the field, to the huge diameter giants of 
today. They have enabled the discovery of the first planetary system found outside 
of our Solar System? and the cosmic microwave background,® and are presently 
the instruments around which the endeavor of detecting the “red end” of the 
gravitational-wave spectrum is based (e.g. Ref. 4). Single dishes have been respon- 
sible for the discovery of the vast majority of pulsars, establishing the existence 
of fast radio bursts (FRBs), the detection of most interstellar molecular species, 
large-area background/foreground surveys of both line and continuum emission, 
high sensitivity very long baseline interferometry (VLBI), and solar system radar 
studies. 

Indeed, the “Era of the Single Dish” is far from over. The Five hundred- 
meter Aperture Spherical Telescope (FAST) is currently under construction in 
Guizhou Province, China, and will provide the greatest collecting area yet of any 
“single-aperture” radio telescope. In addition, other large single-dish telescopes have 
recently been commissioned, e.g. in Sardinia, Italy (the SRT), and at Sierra Negra in 
Mexico (the LMT/GTM). Significantly, summer schools organized by the National 
Astronomy and Ionosphere Center (NAIC) and the National Radio Astronomy 
Observatory (NRAO) on “Single-Dish Radio Astronomy: Techniques and Appli- 
cations” have taken place every two years since 2001, and continue to be heavily 
oversubscribed. Indeed, a fine introduction to the practical aspects of the discipline 
is to be found in the published proceedings of the 2001 school.° 


2. The Single-Dish “Zoo” 


‘ 


‘z00 of beasts” having 


‘ 


The “single-dish family” contains what, in some ways, is a 
a wide variety of shapes and sizes. Figure 1 presents a cross-section of this “zoo”. 
The classic fully-steerable parabolic-reflector telescopes themselves descend from 
the pathfinder instrument constructed by Grote Reber in the late 1930s. Designs 
have evolved greatly over the years, culminating in the 110-m x 100-m Green Bank 
Telescope (GBT), which provides a completely unblocked aperture yielding the 
cleanest possible beam pattern. However, large fully-steerable telescopes have an 
upper limit to their sizes due to practical mechanical engineering considerations. 
This size limitation has been overcome by building structures that have their reflec- 
tors fixed to the ground, often exploiting the natural shape of the terrain to support 
them. A number of methods have been used to permit such telescopes to track celes- 
tial bodies, from the azimuth-zenith angle solution of Arecibo’s 305-m Telescope, 
to the ingenious usage of the local geography by the Ooty Radio Telescope, which 
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Fig. 1. A selection of single-dish telescopes of the world. These are (left-to-right, top-to-bottom): 
(i) the Effelsberg 100-m diameter telescope (Germany); (ii) the Jodrell Bank 76-m telescope 
(UK); (iii) the 110-m x 100-m GBT (WV, USA); (iv) the Arecibo 305-m telescope (Puerto 
Rico); (v) the 530-m x 30-m Ooty radio telescope (India); (vi) the Nangay telescope (France); 
(vii) the Bell Labs horn antenna (NJ, USA); and (viii) the RATAN-G600 telescope (Russia). Credit: 
(i) Dr. Schorsch; (ii) Mike Peel (Jodrell Bank Centre for Astrophysics, University of Manchester; 
Lovell Telescope, Jodrell Bank Observatory); (iii) NRAO/AUI/NSF (https://public.nrao.edu/ 
gallery/green-bank-telescope/); (iv) Arecibo Observatory, a facility of the NSF; (v) Ooty Radio 
Telescope, National Centre for Radio Astrophysics; (vi) Wouter Hagens, Nangay Observatory; 
(vii) NASA; (viii) amleKcaHAp C KaBka3a. 


4 C. J. Salter 


can track a celestial body located anywhere within 86% of the total sky area for up 
to 9.5 hr. Some telescopes falling within our definition of a “single dish” are not at 
all dish-like. Typical examples are the Bell Labs horn antenna used by Penzias and 
Wilson,? the Nancay Radio Telescope, and the RATAN-600. 


3. Quantifying the Radio Emission from Celestial Objects 


Intensity: The fundamental observable in radio astronomy is the Intensity, I, 
also often called the Brightness or Surface Brightness, B, of radio waves arriving at 
the Earth. This is basically a function of five variables when observing the sky with 
a single dipole (linear polarization) or helical feed (circular polarization), i.e. 


I=I(a, é, v,t,0) = Bla, é, v,t, 0), (1) 


where a and 6 are the celestial coordinates, v is the frequency, t is the epoch of 
observation, and ¥ defines the polarization state to which the feed is sensitive, i.e. 
the parallactic angle of the dipole, or the handedness of the helix. We will neglect 
the polarization state for now, and assume the incident radiation to be unpolarized. 

Considering the energy dF arriving from a small solid angle of the sky, dQ, in 
the direction (a, 6) and crossing an area dA normal to this direction in a time dt, 
within a band of frequencies, dv, centered on v, then 


dE(a, 6,v,t) 


ND NO ag es aaa (2) 


The standard MKS unit of Intensity is Wm~? Hz~! ster~?. 


Brightness Temperature: Often, rather than the above standard unit, it is 
more convenient to express Brightness as Brightness Temperature, Tp. The physical 
significance of this is that if we were able to replace our small patch of sky by a black 
body of temperature Tp, then at our observing frequency, v, we would measure the 
same brightness as is actually observed. (Note that Tp for our patch of sky is only 
independent of frequency if it is truly emitting as a black body. The Moon and 
most planets approximate black bodies rather well, but this is not the case for the 
majority of astronomical objects.) 
Tg and B are related via the Planck formula, 


2hv 
p= oe (3) 
c?(e*™ — 1) 


where fA is Planck’s constant, k is Boltzmann’s constant, and c is the velocity of 
light. 


Happily, for most radio frequencies (though beware when considering millimeter 
wavelengths), v is sufficiently small and Tx sufficiently high that Hi <1, and the 
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Rayleigh—Jeans approximation applies, i.e. 


2 kTpv? 2 kTp 
i 


meee ae (4) 
Flux Density: Using a radio telescope, one can measure the distribution of 
brightness over any area, and produce a two-dimensional image of the intensity 
within it (either as a contour map, or a radio image, see Sec. 6.1). It will be found 
that the radio sky contains a multitude of discrete radio sources, each with its own 
well-defined spatial boundaries. A global parameter that characterizes the strength 
of the emission from any discrete source at an observing frequency, v, is the power 


B 


received on a unit area per unit bandwidth from the whole source. This quantity 
is called the flux density, S, and is the spatial integral of the brightness over the 
whole source: 


sw.)= | / B(o,5,v,t)dads = | | B(a, 6, v,t) dQ. (5) 


The standard MKS unit for flux density (NEVER, never to be called just 
“flux”) is Wm? Hz~!. In practice, the flux densities of celestial radio sources 
are so tiny that a practical unit named the Jansky (Jy) has been adopted, where 
1 Jy = 10-2 Wm~?Hz~!. With many present-day telescopes, it is often even more 
practical to use milliJy (1 mJy = 10-3 Jy), or even microJy (1 wJy = 10° Jy). 

For a source situated at a distance r in a Euclidean universe, it is to be noted 
that both the energy received from it, and its solid angle, decrease «x r~?. As the 
intensity I = dE /(dAdQ dv dt), I is seen to be independent of source distance. In 
other words, a distant source looks smaller than an identical, but nearer, source, 
but both have the same surface brightness. In contrast, S = dE’/(dA dv dt), which 
is x r~?. Hence the flux density falls as the inverse square of the distance. 


4. Radio Telescopes and their Characterization 


Radio telescopes are devices that selectively collect radio energy from a given direc- 
tion. This implies the concept of a beam that “looks” quasi-exclusively at a limited 
region of the sky. Such a telescope could be a simple Yagi antenna, or an elementary 
broadside array of dipoles, which would be “selective” in terms of both the position 
on the sky where their beams point and the bands of frequencies that they respond 
to. At the other end of the scale are huge reflecting telescopes, such as those at 
Arecibo, Parkes and Yevpatoria. The important similarity between all these instru- 
it 


ments is that they combine the radio waves arriving from a chosen direction “in 
phase”, resulting in a beam, or reception pattern, in that direction. 


Telescope Mounts: A majority of radio telescopes have the ability to point to 
where the user desires via two orthogonal axes about which the instrument can be 
rotated. The varieties most encountered are: (i) the equatorial mount, where one 
axis is parallel to the terrestrial rotation axis. Once pointed at a radio source, the 
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object can be followed from rise to set by just rotating the telescope about the axis 
parallel to the Earth’s axis. While convenient, the main axis of the instrument is not 
vertical, but at an angle to the horizontal equal to the geographical latitude. As radio 
telescopes can weigh well over a thousand tons, supporting the prime axis at this 
angle becomes a challenge. Nevertheless, several large equatorially mounted radio 
telescopes have been constructed, e.g. the NRAO Green Bank 140-ft telescope, and 
the Ooty Radio Telescope in South India; or (ii) today it is more usual to employ an 
altitude-azimuth mount. Here one rotation axis is parallel to the local vertical, with 
the second axis being horizontal. To track an object with such an arrangement, 
the instrument has to be simultaneously driven about both axes. This is readily 
achieved via computer control. Examples of telescopes using such mounts are the 
GBT, the Effelsberg 100-m telescope, and the Arecibo 305-m telescope. 


The Reciprocity Theorem: Some single-dish antennas are used for both trans- 
mitting and receiving radio waves, especially in the pursuit of Solar System radar 
astronomy, an endeavor for which knowledge of the reciprocity theorem is essential. 
This theorem also provides a useful way in which to think about antenna properties, 
some aspects being more easily visualized considering the system in transmitting 
mode, while for others it is convenient to consider it as a collector of radio waves. 
The Reciprocity Theorem states that an antenna can be treated equivalently as 
a device that receives, or transmits, radio energy. In the words of Ref. 6, “if a 
voltage is applied to the terminals of antenna A and the current is measured at 
the terminals of another antenna B, then an equal current (in both amplitude and 
phase) will appear at the terminals of A if the same voltage is applied to B” (Fig. 2). 
This implies that the telescope beam pattern (see below) measured when receiv- 
ing a signal is identical to that measured when transmitting a signal. Reciprocity 
breaks down in the presence of Faraday rotation during propagation through a 
magneto-ionic medium, as there the propagation properties are not independent of 
the propagation direction, i.e. are not reciprocal. 


“Tluminating” a Telescope: Reciprocity implies that if a receiver is replaced 
by a transmitter, as it is for planetary radar experiments at Arecibo and Goldstone, 
the beam pattern for transmitting is identical to that for receiving. There are two 
basic ways of “illuminating” a telescope’s main reflector, these being radiating from, 
(i) the prime focus, i.e. the feed system directly illuminates the reflector, and (ii) 
use of a secondary focus, i.e. the feed illuminates a secondary reflector prior to the 
radiation reaching the main reflector. The most common secondary foci are obtained 
via a Cassegrain or Gregorian subreflector system. In the former, a hyperbolic sub- 
reflector is situated in front the focal point of the primary reflector, while the latter 
places an elliptical subreflector beyond the primary focal point. The advantages 
of using a secondary focus can include a larger depth of focus, purer polariza- 
tion characteristics, easier access to the receiver systems, and greater mechanical 
support for mounting heavy receiver packages or multi-frequency receiver systems. 
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Antenna A Antenna B 


Fig. 2. Reciprocity implies that the transmitter voltages, V, and receiver currents, J, are related 
such that Vg/Ia (upper situation) = Va /Ip (lower).® 


At Arecibo, the use of specially-shaped secondary and tertiary reflectors in the 
suspended “Gregorian dome” of the telescope (Fig. 3) effectively “parabolizes” the 
305-m diameter spherical primary surface. 


Telescope Beam Pattern: The fundamental characteristic of any single-dish 
telescope is its beam pattern or power polar diagram, P(@,¢,v). This gives the 
response of the instrument at a frequency v to power arriving from a direction (0, ¢) 
relative to that from the direction of maximum response (0, 0). Thus, by definition, 
P(0,0) = 1. The beam pattern is directly related to the diffraction properties of 
the telescope aperture, and has a main beam with subsidiary sidelobes (Fig. 4; the 
sidelobes in the rear 27 steradians often being called the backlobes). Geometric errors 
can affect the beam pattern. For example, if the telescope feed is offset laterally 
from the main axis of the telescope surface, this introduces coma lobes into the 
beam pattern, which are asymmetric, near-in sidelobes. Further, if the feed is not 
situated exactly on the focal plane of the reflector system, this will broaden the main 
beam, reduce the “sharpness” of the nulls in the sidelobe pattern, and increase the 
sidelobe level.® Sidelobe levels are also raised by “blockage” of the illumination 
pattern caused by such things as a feed cabin, a subreflector, and their support 
legs. Such blockage can be avoided by using an offset parabola, as is the case for the 
Ooty Radio Telescope and the GBT. To minimize sidelobe levels, the illumination 
pattern is usually “tapered” towards the edge of the aperture. An edge taper of 
about 11 dB will also maximize the aperture efficiency (see below). 
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Fig. 3. The secondary and tertiary reflector system in the “Gregorian dome” of the Arecibo 
305-m telescope. Radiation from the spherical primary reflector arrives from the bottom of the 
figure, and via the “shaped” secondary and tertiary mirrors is brought to a point focus, effectively 
correcting the spherical aberration of the primary surface.” 


sidelobes 


backlobe bt mainlobe 


sidelobes 


Fig. 4. A schematic polar plot of a telescope beam pattern illustrating the main beam (mainlobe), 
sidelobes and backlobe. 


A design objective for any observing facility is to minimize the sidelobe and 
backlobe levels, as these represent unwanted responses accepting power from direc- 
tions in which one would prefer it to be rejected. The lower the sidelobe level, the 
better a telescope can detect weak objects close to a strong radio source, setting 
the dynamic range of an observation. 


Main-Beam Resolving Power: The parameter that is usually employed to 
characterize the resolving power of a radio telescope is its Half-Power Beam Width 
(HPBW), also known as the Full-Width Half Marzimum (FWHM). This is the angu- 
lar width of the beam pattern where its power pattern has dropped to one half of 
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its peak value; ie. P(6,¢,v) = 0.5. If the main beam is elliptical, the major and 
minor axes are known as the Principal Planes, and the beam is characterized by 
the HPBW in each of these planes, plus the position angle of the orientation of its 
major axis on the sky (measured from N through E). 

From diffraction theory, 


1 
HPBW & = rad = f_€__ 6 
Dp No. of \’s across the aperture’ (6) 
where A is the observing wavelength, and D is the telescope diameter. 
For a typical single-dish telescope, with tapered illumination across the aper- 


ture, a better approximation is 
HPBW = 1.2 x D rad. (7) 


Effective Area: Here, it is useful to introduce the concept of a point source. This 
is a radio source whose angular diameter is O; < HPBW. The flux density of such a 
compact source is defined to be S(v) = dE/(dA dv dt) and the total power collected 
by our telescope can be expressed via integration to be S(v)Aeg(v)Av, where Av 
is the receiver bandwidth, and Agg(v) is called the Effective Area of the telescope 
at frequency v. 

In practice, any single radio receiver chain can only collect the power from a 
single polarization, so the power it collects from an unpolarized compact source is 
$5(v)Ace(v)Av. Note that for a point source (and only a point source) the power 
received when pointing directly at it is proportional to its flux density. This has 
led to the popular calibration of brightness in terms of equivalent point-source 
flux density, often expressed in units of Jy/beam. It should be remembered that 
brightness expressed thus for an extended source only has quantitative meaning 
in terms of a particular telescope and its detailed observing system. If the effec- 
tive area Agg(v) is independent of the pointing direction, to measure the rela- 
tive flux densities of two point sources it is only necessary to measure the power 
received from each source, the ratio of the measurements giving the ratio of the flux 
densities. 


Antenna Temperature: Suppose that after observing our point source, the tele- 
scope feed were to be replaced by a matched resistor whose temperature can be 
adjusted until the noise power received from the resistor equals that previously 
received from the point source. Now, the power received over the receiver bandwidth 
Av from the resistor is kT, Av, where T, is known as the Antenna Temperature. 
Thus, kT,Av = $S(v)Ace(v)Av, and 


2kT a 
=a (8) 
S(v) 
In other words, if we can measure the antenna temperature for a point source 
of known flux density, we can then calculate the effective area of the telescope. 


Aegt 
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Aperture Efficiency: For many antennas (e.g. parabolic dishes) we can assign 
a “physical area”, Ap, to an antenna. Thus, we can define an Aperture Efficiency, 
na, to be 


Aer 
Ap 


nA = <1. (9) 


Towards the upper end of the frequency range at which a radio telescope can be 
usefully operated, its aperture efficiency drops off. A handy rule-of-thumb is that 
a telescope’s upper frequency limit is roughly that at which the rms deviation of 
its surface from its design shape is 1/16th of the wavelength. A more quantitative 
estimate for the degradation factor, 6, relative to its peak performance is given by 
the Ruze formula, 


a eae (10) 
where A is the wavelength and «€ is the rms surface error. 


Beam Solid Angle: If the Power Polar Diagram at a particular frequency is 
P(6,¢), then the Beam Solid Angle, Qa, at that frequency is defined to be 


m= | P(0,@) dQ. (11) 


It can be shown that the Effective Area, Agg, is related to Qa via the so-called 
Antenna Theorem: 


Ac Qa = ?. (12) 


Defining the Main-Beam Solid Angle to be Quy = Jf. 
the Main-Beam Efficiency, nm, is 


P(0,¢)dQ, then 


ainbeam 


Qu 


ee (13) 


1M. 
If we approximate the telescope main beam by a two-dimensional Gaussian distribu- 
tion having maximum and minimum HPBWSs, @max and Onin (usually a reasonable 
approximation), then Qy & 1.133 @max Omin- 


Directivity: The Directivity, D(0, ¢), of a transmitting antenna is defined to be 
the transmitted power per unit solid angle in the direction (@, ¢), relative to that 
of an isotropically radiating antenna (i.e. one that radiates the same input power 
uniformly in all directions). The peak directivity is 

dn 


D(0,0) = 5 (14) 
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Far-Field Distance: The Far-Field Distance is the distance beyond which the 
antenna power pattern can be considered to be well formed; i.e. beyond which 
the electric and magnetic field strengths uniformly decrease with the inverse of the 
distance, and the power inversely with the distance squared. This is conventionally 
taken to be the distance beyond which a point on the principal axis of the antenna 
has a path difference between the center and edge of the aperture plane of 1/16th ofa 
wavelength. The far-field distance Rg for a telescope of diameter D at a wavelength 
d is 

2D? 
Re = : 


(15) 
5. The Aperture Illumination and the Antenna Field Pattern 


The beam pattern of a telescope is determined by its diffraction properties. To 
quantify this, we consider the Aperture Plane, a notional plane in free space over 
the telescope on which the electric field is specified, and where the magnetic and 
electric fields are everywhere perpendicular both to each other and to the direction 
of propagation.® We can define (i) orthogonal coordinates in the aperture plane 
in units of the observing wavelength, x, = «/A; y, = y/A, and (ii) | and m, the 
direction cosines of the angles measured from the telescope’s principal axis. If the 
electric-field (amplitude and phase) distribution over the antenna aperture plane is 
F(a, yy), then the Voltage Polar Diagram (not “Power Polar Diagram” this time!), 
E(é,m), is given by the Fourier transform, 


E(é,m) =}f F(a, yx) 2m earn) dary dyn, (16) 
and hence, 
F(a, yy) = 7 / E(,m) et 2™(2rtmya) de dm. (17) 


A number of things can be noted concerning these relationships: 


(a) For 0? + m? < 1, the angles represented by ¢ and m have physical mean- 
ing as directions and, considering an antenna being used for transmitting, 
E(€,m) represents power radiated in physically meaningful directions. How- 
ever, if 2? + m? > 1, then E(€,m) represents power that “ebbs and flows” in 
the neighborhood of the antenna, i.e. “lost power”. This part of the pattern 
results from structure in the aperture distribution smaller than a wavelength 
across, e.g. bolt heads, narrow panel gaps, small holes, etc. 

(b) Unlike power, which is a scalar, both F(a), y)) and E(£,m) are complex quan- 
tities, as expected for values representing the electric field. 

(c) If we can measure the voltage polar diagram, e.g. by interferometry, then the 
field distribution across the aperture plane can be derived via Eq. (17). The 
distribution of phase over the aperture tells about imperfections in the dish 


12 C. J. Salter 


surface, with a dent or a bump showing up as a phase deviation in the aperture 
plane. In contrast, the amplitude distribution reveals the feed “illumination 
pattern” over the aperture plane. Such “Radio Holography” can provide the 
information needed to “tweak” the surface in order to improve it. Alternatively, 
a telescope possessing a subreflector can have this designed such as to first-order 
correct for primary-reflector imperfections. 


In single-dish radio astronomy, we are usually interested in the Power Polar Diagram 
of an antenna, rather than the Voltage Polar Diagram. As power is proportional to 
the square modulus of the electric field, the Power Polar Diagram is given by 


P(0,m) = E(é,m) x E*(£,m). (18) 


As an example, for a one-dimensional aperture with uniform illumination (i.e. 
an aperture distribution that is a “top-hat” function), by Eq. (16) its Voltage Polar 
Diagram is FE = sin(k@)/(k@), where k = 7a and a is the length of the aperture. 
Hence, its Power Polar Diagram is (sin(k 0)/(k 0))?. 


6. Antenna Smoothing of the Celestial Brightness Distribution 


When we scan the beam of a single-dish radio telescope across the sky, how does 
the recorded output of our receiver relate to the true distribution of sky brightness? 
Consider a “one-dimensional sky” that has a brightness-temperature distribution, 
Tp(0). This we scan with our “one-dimensional antenna”, whose power polar dia- 
gram is P(@). At the instant when we are pointing at a position, 6’, the contribution 
to the antenna temperature by the radiation received from any direction 6 is atten- 
uated by a factor of P(0’ — 0). As the “beam-weighted power” received from the 
whole sky gives us the antenna temperature registered by our receiver, 


_ fy, Ta) P(@’ — 0) 0 f, Ta (6) P(’ — 6) dO 


Ta(0’) i= P(6! _ 0) do _ Oa 


(19) 
where Q is the Beam Solid Angle, Qa = f, P(6’ — @) dé. 

The convolution, {, T3(@) P(6’ — @) dé, implies that when our single-dish tele- 
scope is scanned across the sky, it acts as a “low-pass filter” of the celestial brightness 
distribution. To understand this, it should be remembered that any distribution can 
be decomposed, i.e. Fourier transformed, into a continuous distribution of (complex) 
sinusoids which, when summed together, reproduce the original distribution. Of 
course, this is true of the distribution that we call the celestial brightness. Now 
the convolution theorem states that the Fourier transform (FT) of a convolution 
equals the product of the FTs of the two functions being convolved. Thus, the FT 
of the observed brightness distribution is the product of the FT of the Power Polar 
Diagram, and the FT of the actual celestial brightness distribution. (We note that 
if the sky brightness distribution is expressed in radians, then its FT is expressed 
in cycles/rad.) 
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If the celestial emission is scanned by a single-dish telescope of size W wave- 
lengths (i.e. W = D/A), then all Fourier components in its brightness distribu- 
tion larger than W cycles/rad are totally suppressed in the telescope output, while 
Fourier components below this value are “reweighted” by the FT of the Power 
Polar Diagram. In other words, our telescope acts like a “low-pass filter” of the sky 
brightness distribution. 

This immediately demonstrates that if we scan across a point source (i.e. one 
whose F'T is essentially uniform for components < W cycles/rad), then the output 
traces out the telescope’s Power Polar Diagram. Note also that because our telescope 
suppresses all Fourier components (spatial frequencies) > W cycles/rad, there are 
an infinite number of possible sky brightness distributions that are compatible with 
the celestial distribution as recorded by our telescope. This infinite set of possible 
distributions differ only in their Fourier components > W cycles/rad. 


6.1. Mapping with a Single-Dish Telescope 


Often we wish to map the celestial brightness distribution of a region of sky with 
our single dish. The method most frequently employed is to make parallel scans of 
the field, separating adjacent scans by a given interval, and building up a raster 
coverage, just as a TV picture is built up. However, unlike a TV picture, the tele- 
scope is often scanned in opposite directions for adjacent scans. This is known as 
boustrophodonic scanning. As the level of the receiver noise, etc., is likely to drift 
during the coverage, it is customary that at least a few scans (sometimes a com- 
plete coverage) are taken in the orthogonal direction, which can later be used to 
set all scans in the coverage on an internally-consistent base level to produce the 
final image or contour diagram by an analysis technique known as basket weaving 
(e.g. Ref. 9). (In a contour diagram, the contour lines joining positions of equal 
brightness are known as “isophotes”.) Variants of this scanning procedure suitable 
for providing the data needed to minimize systematic errors have also been used for 
large-area sky surveys made with single-dish radio telescopes; e.g. “NODding” 1° 14 
and “WAGging”!? scans. 

The question arises as to how far apart adjacent scans should be spaced in such 
a raster coverage, and how often data need to sampled along a scan. In order not to 
lose any information available to our telescope (i.e. to faithfully record all spatial 
frequencies below W cycles/rad), the low-pass filtering property of our telescope 
decrees that no information is lost if our “sampling interval” is chosen to be no 
larger than A@, where 


1 mN 


(20) 

Of course, sampling more closely than this critical interval is fine. However, if 
our maps are “undersampled”, we should be very cautious as to how we interpret 
the resulting images. 
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7. Measuring Polarization 


The polarization properties of radiation received by a telescope can be specified 
via the Stokes parameters (I,Q,U,V), which are easily derived from direct mea- 
surements. These characterize the full polarization properties of the radiation via 
four scalar distributions that can be individually manipulated for such operations 
as addition, convolution, interpolation, etc. To define the Stokes parameters we can 
consider either (i) the transverse electric fields in orthogonal directions F,. (hori- 
zontal component) and Ey (vertical component), and the equivalent right- and left- 
hand circularly polarized fields Ep and Ey or (ii) Ig, the intensity of the radiation 
at an orientation 6, measured from north through east, with the intensities of the 
equivalent right- and left-hand circularly polarized waves being JR and [,,. Then 


I= (EXE, + EXE) = (ERER + EvEt) 


= Io + Igo = Las + L135 = IR + Ih, 21 


Q= (Ey Ey — EEX) = (EE ER + ELER) = Ip — Ino, 22 


U = (EF Ex + Ey Ex) = (EX ER = EER) => I45 = 1135, 23 


( 
( 
( 
(24 


) 
) 
) 
V = (EY Ex — Ey EZ) = (ERE — Ex. E{) = In - th. ) 


It is seen that the Stokes parameters can be measured either by correlation of the 
received voltages or by addition/subtraction of measured intensities. 

The Stokes parameter J represents the total intensity of the radiation. The 
linearly polarized intensity [, can be specified through Imax and Jmin, the maximum 
and minimum intensities measured when a dipole feed orthogonal to the wave front 
is rotated through a half-turn, by Jp = Imax — Imin = (Q? + U?)2, Thus, the degree 
of linear polarization is given by my = Ip/I = (Imax — Imin)/(Imax + Imin) = 
(Q? + U?)2/I. The polarization position angle is the orientation of the major axis 
of the polarization ellipse, x = 0.5 tan~!(U/Q), defined from the north through the 
east. Similarly, the degree of circular polarization is mc = (Ip — I,)/UR + I) = 
V/I. For circular polarization, a right-handed component (V positive) is defined 
as having the position angle of the electric vector of an incoming wave increasing 
with time as measured at a fixed point. The total degree of polarization is given by 
m = (Q2+U2+V2)2/I. 


8. Further Reading 


Space limits this chapter to discussing only the broader considerations of single-dish 
radio telescopes. To “go deeper” on specific aspects of the subject, a number of other 
references can be explored. An old, but still valuable, text is that of Ref. 13, while 
a scholarly presentation on single dishes may be found in Ref. 8. Textbooks with 
large sections dealing with many topics of “single-dish” astronomy include Refs. 6 
and 14-16. 
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Appendix. Single-Dish Radio Telescopes of the World 


A.1. Introduction 


This appendix contains a compendium of information for many of the single-dish 
telescopes of the world. It is hoped that it will help prospective single-dish users to 
select the instrument that best fits their needs in terms of geographical location, 
frequency coverage, angular resolution, sensitivity and backend availability. A web 
reference is given for each telescope so that, having selected possible instruments of 
interest, an astronomer can find out about them in detail, and also how to apply 
for observing time. We stress the importance of visiting the appropriate web pages, 
as the lists here doubtless omit important details, while policies, procedures and 
instrumental availability change with time at every observatory, and many of the 
details in these tables may soon be out-dated. 

The author does not claim this compendium to be all-inclusive, but believes 
that most of the world’s high-profile single dishes are included. The omission of any 
telescope should be viewed only as the author’s oversight, and not an adjudication as 
to the importance of that observing facility. Another caveat is that telescopes used 
predominantly as elements of VLBI or synthesis arrays have generally not been 
included unless they are also regularly available for single-dish radio astronomy 
per se. A few telescopes that are presently (May 2016) under construction have 
been included in the tables, and are noted as such in the appropriate places. 


A.2. The Tables 


The information contained in Tables A.1-A.4 summarizes the details of individual 
single-dish telescopes. Much of it has been drawn from the web pages of the host 
observatories. As such, for any interested reader there is no substitute for a full 
perusal of the web pages themselves. 

Table A.1 contains information on the geographical location of a telescope. The 
columns are: (1) a “Tag” Name that will identify a telescope in the four tables, 
which was chosen to be as descriptive of the instrument as possible, often being 
the official name of the telescope (e.g. GBT, LMT/GTM, RATAN-600); (2) the 
telescope location; (3) its longitude, with eastern longitudes positive; and (4) its 
latitude. 

Table A.2 lists a few fundamental parameters for each telescope. The columns 
are: (1) the “Tag” Name. (2) The physical size of the telescope in meters. A single 
diameter is given for circular dishes, with two dimensions listed for elliptical or 
rectangular telescopes. The resolution (HPBW) and sensitivity of an instrument 
can be roughly estimated from its size, D, via HPBW ~ 1.2\/D radians, and gain~ 
2.8x 1074774 D(meters)” K/Jy, where 7, is the aperture efficiency. (3) The frequency 
range over which receivers are available. Note that this does not imply continuous 
coverage. To find typical system temperatures, the appropriate web pages should be 
consulted. (4) The approximate sky coverage of the instrument. In many cases this 
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was estimated from the geographical latitude of the observatory and the telescope 
elevation drive limits. For sites that do not list the drive limits on their web pages, 
coverage of 0° < elevation < 90° has been assumed, though this may well be in error. 

Table A.3 lists a few operational details for each telescope. The columns are: (1) 
the “Tag” Name. (2) The operating organization. (3) If the web page of relevance 
clearly invites proposals for observing time from “outside users”, then “Yes” is 
entered in this column. Interested users should consult the web pages themselves as 
there may be additional requirements for some facilities, such as the finding of an 
“inside collaborator”. However, most institutions will entertain exciting proposals to 
use their facilities productively, and the lack of an entry indicating a regular proposal 
system should not be taken to mean that the appropriate management cannot be 
approached concerning use of their telescope, should it be your instrument of choice. 
Contact information is usually provided on the relevant web page (see Table A.4). 
(4) A comment tabulating the observing disciplines supported, often as gleaned from 
the web pages (so the entries may not be totally accurate). The abbreviations are as 
follows: C = Continuum observing, SL = Spectral Line observing (i.e. a spectrometer 
is available), P = Pulsar research has been carried out with the instrument, PR = 
Planetary Radar work is done (see the “Arecibo” and “DSS14” entries), and IPS = 
Interplanetary Scintillation measurements. Other comments, including completion 
dates for telescopes under construction or upgrade, should be self-explanatory. 

Table A.4, perhaps the most important for the prospective user, gives the URL 
for a web page to which an interested party can proceed to learn more about a given 
telescope, its equipment, operations and observatory policies. Its columns are: (1) 
the “Tag” Name; (2) the appropriate URL. 


Table A.1. Telescope location. 


Long Lat 
Tag Name Location (2) °) 4 
APEX Chajnantor Plateau, Chile —67 45.6 —23 00.3 
Arecibo Arecibo, Puerto Rico —66 45.2 +18 20.6 
ARO 12-m Kitt Peak, AZ, USA —111 36.9 +431 57.2 
ARO SMT Mt. Graham, AZ, USA —109 53.5 +432 42.1 
Bonn 100-m Effelsberg, Germany +6 53.0 +50 31.5 


Caltech 40-m 


Owens Valley, CA, USA 


—118 16.9 +487 13.9 


C-BASS N Owens Valley, CA, USA —118 16.9 +4837 13.9 
C-BASS S Carnarvon, South Africa +22 08 —30 42 
Ceduna 30-m Ceduna, S. Aus., Australia +133 48.6 —31 52.1 
CHIME Penticton, BC, Canada —119 37.5 +49 19.3 
DRAO Penticton, BC, Canada —119 37.2 +449 19.2 
DSS43 Tidbinbilla, Australia +148 58.8 —35 24.1 
DSS14 Goldstone, CA, USA —116 53.4 +35 25.6 
DSS63 Robledo, Spain —414.9 +40 25.8 
FAST Dawodang, Guizhou, China +107 21 +25 48 
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Tag Name 


GBT 
HartRAO 26-m 
Haystack 37-m 
Itapetinga 
IRAM 30-m 


JCMT 

Jodrell Mk-IT 
Kalyazin 64-m 
KOSMA 
Kunming 40-m 


KVNT 
KVNU 
KVNY 
LMT/GTM 
Lovell 


Medicina 32-m 
Metsahovi 
Miyun 50-m 
Mopra 22-m 
Nangay 


NANTEN2 
Nobeyama 45-m 
Noto 32-m 
Onsala 20-m 
Onsala 25-m 
Ooty 

Parkes 64-m 
Pisgah 26-m’s 


Pisgah 12.2-m 
Purple Mtn. 
Puschino RT-22 
QTT 110-m 
RATAN-600 


Sheshan 25-m 
Simeiz RT-22 
SPT 

SRT 

Suffa 70-m 


Taeduk 14-m 
Tasmania 26-m 
Tasmania 14-m 
TianMa 65-m 
Torun 32-m 
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Table A.1. (Continued) 


Location 


Green Bank, WV, USA 

near Krugersdorp, S. Africa 
Westford, MA, USA 
Itapetinga, Atibaia, SP, Brazil 
Pico Veleta, Spain 


Mauna Kea, HI, USA 

Jodrell Bank, UK 

Kalyazin, Tver Oblast, Russia 
Yangbajing, Tibet 

Phoenix Mtn., Kunming, Yunnan, China 


Tanma, 5. Korea 

Ulsan, S. Korea 

Yonsei, S. Korea 

Sierra Negra, Puebla, Mexico 
Jodrell Bank, UK 


Medicina, Italy 

Metsahovi, Kylmala, Finland 
Bulaotun, Miyun, China 

near Coonabarabran, Australia 
Nangay, France 


Pampa la Bola, Chile 
Nobeyama, Japan 
Noto, Sicily, Italy 
Onsala, Sweden 

Onsala, Sweden 
Ootacamund, India 
Parkes, Australia 

near Rosman, NC, USA 


near Rosman, NC, USA 
Delingha, Qinghai, China 
Puschino, Russia 

Banjiegou, Qital, Xinjiang, China 
Zelenchukskaya, Russia 


Sheshan, near Shanghai, China 
Simeiz, Russia/Ukraine 

South Pole 

Pranu Sanguni, Sardinia, Italy 
Suffa, Uzbekistan 


Taeduk, Taejon City, S. Korea 
Mt. Pleasant, Tasmania, Australia 
Mt. Pleasant, Tasmania, Australia 
Sheshan, near Shanghai, China 
Piwnice, Poland 


Long Lat 

One) one) 
—79 50.4 +38 26.0 
+27 41.1 —25 53.2 
—71 29.3 +42 37.4 
—46 33.5 —23 11.1 
—3 23.9 +37 04.1 
—155 28.6 +19 49.4 
—2 18.2 +53 14.1 
+37 54.0 +57 13.4 
+90 33 +30 05 
+102 47.7 +25 01.5 
+126 27.6 +33 17.3 
+129 15.0 +35 32.7 
+126 56.5 +37 33.9 
—97 18.9 +18 59.1 
—2 18.4 +53 14.2 
+11 38.8 +44 31.2 
+24 23.6 +60 13.1 
+116 58.6 +40 33.5 
+149 06.0 —31 16.0 
+2 11.8 +47 22.8 
—67 42.1 —22 59.8 
+138 28.5 +35 56.5 
+14 59.3 +36 52.6 
+11 55.6 +57 23.7 
+11 55.0 +57 23.6 
+76 40.0 +11 22.9 
+148 15.7 —33 00.0 
—82 52.3 +35 12.0 
—82 52.5 +35 11.9 
—82 52.6 +35 11.7 
+97 44.0 +37 22.0 
+37 55.0 +56 00.0 
+89 40.9 +43 36.4 
+41 35.5 +43 49.9 
+121 12.0 +31 05.9 
+34 01.0 +44 32.1 
00 00 —90 00 
+9 14.7 +39 29.6 
+65 26.0 +39 37.0 
+127 22.3 +36 23.9 
+147 26.3 —42 48.3 
+147 26.3 —42 48.3 


6.1 km from “Sheshan 25-m” 
+18 33.8 +53 05.7 


(Continued) 
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Tag Name 


Urumqi 25-m 
Ventspils 32-m 
Warkworth2 
Yebes 13.7-m 
Yebes 40-m 


Yevpatoria 
70-m 


Tag Name 


APEX 
Arecibo 


ARO 12-m 
ARO SMT 
Bonn 100-m 


Caltech 40-m 
C-BASS N 
C-BASS S 
Ceduna 30-m 
CHIME 


DRAO 
DSS43 
DSssi4 
DSS63 
FAST 


GBT 


HartRAO 26-m 
Haystack 37-m 
Itapetinga 
IRAM 30-m 


JCMT 

Jodrell Mk-II 
Kalyazin 64-m 
KOSMA 
Kunming 40-m 


C. J. Salter 


Table A.1. (Continued) 


Location 


Nanshan, Urumqi, Xinjiang, China 
Irbene, Ventspils, Latvia 
Warkworth, NZ 

Yebes, Guadalajara, Spain 

Yebes, Guadalajara, Spain 


Yevpatoria, Ukraine 


Long Lat 
(°%) © (°) ¢ 
+87 10.7 +43 28.3 
+21 51.3 +57 33.2 
+174 39.8 —36 15.0 
—305.4 +40 31.4 
—306.0 +40 31.5 
+33 11.0 +45 11.0 


+52 
+38 
7) 

+90 
+90 
+90 


+90 
+90 
+59 
+48 
+90 


+90 
+49 
+90 
+90 
+66 


+90 


+45 
+90 
+62 
+90 


+90 
+90 
+90 
+90 


Table A.2. Telescope parameters. 

Size Present Frequency Range Sky Coverage 
12-m 159-1500 GHz —90 < Dec < 4 
305-m 47 MHz-10 GHz —01 < Dec < 4 
(ZA < 19. 
12-m 85-116 GHz —55 < Dec < 4 
10-m 230-720 GHz —55 < Dec < 4 
100-m 300 MHz-95 GHz —31 < Dec < 4 
40-m 15 GHz —48 < Dec < 4 
6.1-m 4.5-5.5 GHz —50 < Dec < 4 
7.6-m 4.5-5.5 GHz —90 < Dec < 4 
30-m 1.2-23 GHz —90 < Dec < 4 
80-m x 100-m 400-800 MHz —40 < Dec < 4 
25.6-m 408 MHz-6.6 GHz —34 < Dec < 4 
70-m 1.6-26 GHz —90 < Dec < 4 
70-m 1.6-26 GHz —49 < Dec < 4 
70-m 1.6-26 GHz —45 < Dec < 4 
500-m 70 MHz-3 GHz —14 < Dec < 4 

(300-m illuminated) 
100-m x 110-m 290 MHz-100 GHz —46 < Dec < 4 

(80-100 GHz bolo. array) 

26-m 1.6—24.0 GHz —90 < Dec < 4 
37-m 22-115 GHz —42 < Dec < 4 
13.7-m 18-50 GHz —90 < Dec < 4 
30-m 83-375 GHz —44 < Dec < 4 
15-m 211-700 GHz —50 < Dec < 4 
37-m Xx 25-m 150 MHz-—24 GHz —36 < Dec < 4 
64-m 600 MHz-9 GHz —31 < Dec < 4 
3-m 210-350 GHz —60 < Dec < 4 
40-m. S/X-bands —65 < Dec < 4 


+90 


(Continued) 


Tag Name 


KVNT 
KVNU 
KVNY 
LMT/GTM 
Lovell 


Medicina 32-m 
Metsahovi 
Miyun 50-m 
Mopra 22-m 
Nancgay 


NANTEN2 
Nobeyama 45-m 
Noto 32-m 
Onsala 20-m 
Onsala 25-m 
Ooty 


Parkes 64-m 
Pisgah 26-m’s 


Pisgah 12.2-m 
Purple Mtn. 
Puschino RT-22 
QTT 110-m 
RATAN-600 


Sheshan 25-m 
Simeiz RT-22 
SPT 

SRT 

Suffa 70-m 


Taeduk 14-m 
Tasmania 26-m 
Tasmania 14-m 
TianMa 65-m 
Torun 32-m 


Urumqi 25-m 
Ventspils 32-m 
Warkworth2 
Yebes 13.7-m 
Yebes 40-m 


Yevpatoria 70-m 


Single-Dish Radio Telescopes 


Table A.2. 


Size 


21-m 
21-m 
21-m 
50-m 
76.2-m 


32-m 
14-m 
50-m 
22-m 
200-m x 35-m 


4-m 

45-m 

32-m 

20.1-m 

25.6-m 

530-m x 30-m 


64-m 
Two x 26-m 


12.2-m 
13.7-m 
22-m 

110-m 
576-m circle 


25-m 
22-m 
10-m 
64-m 
70-m 


13.7-m 
26-m 
14-m 
65-m 
32-m 


25-m 
32-m 
30.5-m 
13.7-m 
40-m 


70-m 


(Continued) 


Present Frequency Range 


22, 43, 86 and 129 GHz 
6.7, 22, 43, 86 and 129 GHz 
22, 43, 86 and 129 GHz 


73-350 GHz 
150 MHz-5 GHz 


1.35-26.5 GHz 
2-175 GHz 

300 MHz-12 GHz 
16-117 GHz 
1.1-3.5 GHz 


110-880 GHz 
20-116 GHz 
0.6-43 GHz 
2.2-116 GHz 
0.8-6.7 GHz 
326.5 + 7.5 MHz 


700 MHz-26 GHz 
327 MHz-8.4 GHz 


3.3-12.75 GHz 
22-115 GHz 

To Mm-wavelengths 
150 MHz-115 GHz 
610 MHz-30 GHz 


1-23 GHz 

327 MHz-36 GHz 
95-345 GHz 

305 MHz-116 GHz 
5-300 GHz 


40-150 GHz 
0.66-22 GHz 
630-1400 MHz 
1-50 GHz 
1.4-30 GHz 


327 MHz-23 GHz 
327 MHz-12.2 GHz 
C-band 

2.2-49 GHz 

2.2-49 GHz 


5-300 GHz 


Sky Coverage 


—56 < Dec < +90 
—54 < Dec < +90 
—52 < Dec < +90 
—90 < Dec < +70 
—34 < Dec < +90 


—45 < Dec < +90 
—29 < Dec < +90 
—43 < Dec < +90 
—90 < Dec < +47 
—39 < Dec < +90 


—90 < Dec < +68 
—42 < Dec < +90 
—49 < Dec < +90 
—30 < Dec < +90 
—30 < Dec < +90 
—60 < Dec < +60 
(-—4 < HA < +5.5) 
—90 < Dec < +27 
—55 < Dec < +90 


—55 < Dec < +90 
—43 < Dec < +90 
—29 < Dec < +90 
—40 < Dec < +90 
—42 < Dec < +90 


—54 < Dec < +90 
—42 < Dec < +90 
—90 < Dec < 0 
—50 < Dec < +90 
—50 < Dec < +90 


—54 < Dec < +90 
—90 < Dec < +31 
? 
—54 < Dec < +90 
—35 < Dec < +90 


—42 < Dec < +90 
—33 < Dec < +90 
—90 < Dec < +48 
—46 < Dec < +90 
—46 < Dec < +90 


—45 < Dec < +90 
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Tag Name 


APEX 
Arecibo 


ARO 12-m 
ARO SMT 
Bonn 100-m 


Caltech 40-m 
C-BASS N 
C-BASS S 
Ceduna 30-m 
CHIME 


DRAO 
DSS43 
DSs14 
DSS63 
FAST 


GBT 


HartRAO 26-m 
Haystack 37-m 
Itapetinga 
IRAM 30-m 


JCMT 
Jodrell Mk-II 


Kalyazin 64-m 
KOSMA 
Kunming 40-m 


KVNT 
KVNU 
KVNY 
LMT/GTM 
Lovell 


Medicina 32-m 
Metsahovi 
Miyun 50-m 
Mopra 22-m 
Nancay 


C. J. Salter 


Table A.3. 
Operating Inst. 


MPIfR/OSO/ESO 
SRI Int./USRA/UMET 


ARO 
UASO 
MPIfR 


Caltech 

C-BASS Collab. 
C-BASS Collab. 
UTasmania 

UBC, MGU, UT & 
DRAO 


DRAO 
NASA 
NASA 
NASA 
NAOC, China 


GBO 


NRF 
NEROC 
INPE 
IRAM 


East Asian Obs. 


U. Manchester 


ASC, Lebedev Phys. Inst. 


Sino-German SMTG 
Yunnan Ast. Obs. 
(YNAO), CAS 


KASI 

KASI 

KASI 
INAOE/UMass 
U. Manchester 


IRA/INAF 
Aalto Uni. 
NAOC 

ATNF 

Paris Obs/CNRS 


Telescope operations. 


Publ Access 


Yes 
Yes 


Domestic 
Domestic 
Domestic 


Yes 
Yes 
Yes 
? 
? 
Yes 


only 
only 
only 


Comments 


C (bolometer), SL 

C, SL, P, PR; L-Band 
multibeam Rx; 
Fixed reflector 

C, SL 

C, SL 

C, SL, P 


C (Blazar monitoring) 

C (full-Stokes) 

C (full-Stokes) 

C, SL 

C, SL, P; Undergoing 
commissioning 


C, SL 

G, 8h, P 

G, Si, .P;.PR: 

C, SL, P 

C, SL, P; Undergoing 
commissioning 


C, SL, P; Unblocked 
aperture 

G, SL; P 

C, SL: Education projects 

C, SL 

C (incl. bolometers), SL 


C (bolo. 350/670 GHz), SL 
(211-375 GHz) 

Mostly dedicated to 
MERLIN 

Cc, P 

C, SL 

C, P; eventually to cover 
1-23 GHz 


C, SL 

C, SL 

C, SL 

C (bolometer array), SL 
C, SL, P 


C, SL, P 
C, SL 
P, IPS 
C, SL 
C, SL, P 


(Continued) 


Tag Name 


NANTEN2 
Nobeyama 45-m 
Noto 32-m 
Onsala 20-m 
Onsala 25-m 
Ooty 


Parkes 64-m 
Pisgah 26-m’s 


Pisgah 12.2 m 
Purple Mtn. 
Puschino RT-22 
QTT 110-m 


RATAN-600 


Sheshan 25-m 
Simeiz RT-22 
SPT 

SRT 

Suffa 70-m 


Taeduk 14-m 
Tasmania 26-m 
Tasmania 14-m 


TianMa 65-m 
Torun 32-m 


Urumqi 25-m 
Ventspils 32-m 
Warkworth2 
Yebes 13.7-m 
Yebes 40-m 


Yevpatoria 70-m 


Single-Dish Radio Telescopes 


Table A.3. 
Operating Inst. 


Intl. Collab. 
NRO/NAOJ 
IRA/INAF 
Chalmers UT 
Chalmers UT 
TIFR/NCRA 


ATNF/CSIRO 
PARI 


PARI 

PMO/CAS 
Lebedev/ASC 

Xinjiang Ast. Obs., CAS 


SAO 


Shanghai Ast. Obs. 
CrAO 

SPT Collab. 
IRA/INAF 


ASC, Lebedev Phys. Inst. 


TRAO/KASI 
UTasmania 
UTasmania 


Shanghai Ast. Obs. 
N. Copernicus U 


XAO/CAS 
VIRAC/VUC 
IRASR, AUT, NZ 
IGN/CAY 
IGN/CAY 


? 


(Continued) 


Publ Access 


21 


Comments 

SL 

C, SL 

C, SL 

C, SL 

C, SL 

C, SL, P, IPS; Unblocked 

aperture 


C, SL, P; L-Band multibeam 
C, SL, P 


C, SL 

C, SL 

C, SL 

C, SL, P; Construction 
planned 

C, SL 


? 

C, SL 

Cc 

C, SL, P 

C, SL, P; Under 
construction(?) 


C, SL 

C, SL 

P; Vela PSR monitoring 

Undergoing structural 
repairs: 2014 

C, SL, P; (Active surface) 

C, SL, P 


SL 
C, SL 


CSL PR 


Note: In “Comments”, the available types of observing are listed, where C = Continuum, SL = 
Spectral Line, P = Pulsar, PR = Planetary Radar, IPS = Interplanetary Scintillation. 
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Tag Name 


APEX 
Arecibo 
ARO 12-m 
ARO SMT 
Bonn 


Caltech 40-m 
C-BASS N 
C-BASS S 
Ceduna 30-m 
CHIME 


DRAO 


DSS43 
DSS14 
DSS63 
FAST 


GBT 
HartRAO 26-m 
Haystack 37-m 
Itapetinga 
IRAM 30-m 


JCMT 

Jodrell Mk-II 

Kalyazin 64-m 

KOSMA 
(renamed 
CCOSMA) 

Kunming 40-m 


KVNT (renamed 


KVNTN) 


KVNU (renamed 


KVNUS) 


KVNY (renamed 


KVNYS) 
LMT/GTM 
Lovell 


Medicina 32-m 
Metsahovi 
Miyun 50-m 
Mopra 22-m 
Nangay 


NANTEN2 


Nobeyama 45-m 
Noto 32-m 
Onsala 20-m 
Onsala 25-m 
Ooty 


C. J. Salter 
Table A.4. Web Page. 
WWW Page 


http://www.apex-telescope.org/ 

http://www.-naic.edu 

https: //www.as.arizona.edu/arizona-radio-observatory 
https://www.as.arizona.edu/arizona-radio-observatory 
http://www.mpifr-bonn.mpg.de/effelsberg/astronomers 


http://www.astro.caltech.edu/ovroblazars/ 
https://cbass.web.ox.ac.uk/ 
http://www.hartrao.ac.za/c-bass/ 
http://www.phys.utas.edu.au/physics/Ceduna.html 
http: //chime.phas.ubc.ca/ 


https: //nre.canada.ca/en/research-development /nrc-facilities /dominion- 
radio-astrophysical-observatory-research-facility 
http://www.cdscc.nasa.gov /Pages/Antennas/dss43.html 
http://gssr.jpl.nasa.gov/ 

https: //www.mdscc.nasa.gov /index.php/dss-63/ 
http://fast.bao.ac.cn/en/ 


http://greenbankobservatory.org 

www.hartrao.ac.za 
http://www.haystack.mit.edu/obs/haystack/index.html 
http: //www.cea.inpe.br/roi/ 

http: //www.iram-institute.org/ 


http://www.eaobservatory.org/jcmt/ 

http: //www.jb.man.ac.uk/history/mk2.html 

http://www.asc.rssi.ru/Kalyazin/ 

http://english.nao.cas.cn/Research2015/rp2015/201701/t20170120_173615. 
html 


http://wwwl.ynao.ac.cn/en/ 


http://radio.kasi.re.kr/kvn/main_kvn.php 
http: //radio.kasi.re.kr/kvn/main_kvn.php 


http://radio.kasi.re.kr/kvn/main_kvn.php 


http://www.lmtgtm.org/ 
http: //www.jb.man.ac.uk/aboutus/lovell/ 


http://www.med.ira.inaf.it/parabola32m.html 
http://metsahovi.aalto.fi/en/about/ 
http://english.nao.cas.cn/ 

http: //www.narrabri.atnf.csiro.au/mopra/ 
http://www.obs-nancay.fr /-Radiotelescope-.html 


Official website is broken. The best I can find is the wiki page 

https: //en.wikipedia.org/wiki/NANTEN2_Observatory 
www.nro.nao.ac.jp/~nro45mrt/index-e.html 
http: //www.noto.ira.inaf.it/ 
www.chalmers.se/en/centres/oso/radio-astronomy /20m/Pages/default.aspx 
www.chalmers.se/en/centres/oso/radio-astronomy /25m/Pages/default.aspx 
http: //rac.ncra.tifr.res.in/ort.html 


(Continued) 


Tag Name 


Parkes 64-m 
Pisgah 26-m’s 


Pisgah 12.2-m 
Purple Mtn. 
Puschino RT-22 
QTT 110-m 


RATAN-600 


Sheshan 25-m 
Simeiz RT-22 


SPT 
SRT 
Suffa 70-m 


Taeduk 14-m 
Tasmania 26-m 
Tasmania 14-m 
TianMa 65-m 
Torun 32-m 


Urumqi 25-m 
Ventspils 32-m 
Warkworth2 
Yebes 13.7-m 
Yebes 40-m 


Evpatoria 70-m 


References 
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Table A.4. (Continued) 


WWW Page 


www.parkes.atnf.csiro.au 
www.pari.edu 


www.pari.edu 

http://english.pmo.cas.cn/ 

http://www.prao.ru/English/radiotelescopes/telescopes.html 

http://qtt.xao.cas.cn/english/ (website seems a bit unfinished, but it is 
the official website — the only other English page I could find about it 
is http://english.xao.cas.cn/fa/qo/) 

https://www.sao.ru/hq/CG/cold/part2.htm 


http://english.shao.cas.cn/fs/201410/t20141008_128932.html 

Observatory doesn’t appear to be active. Only urls I could find are the 
wiki page: https://en.wikipedia.org/wiki/Simeiz_Observatory and a 
stub page from the European Science Foundation’s Committee on 
Radio Astronomy Frequencies 
https: //www.craf.eu/radio-observatories-in-europe/simeiz/ 

pole.uchicago.edu 

http://www.srt.inaf.it / 

http: //asc-lebedev.ru/index2.php?engdep=16&engsuffa=1 


https: //radio.kasi.re.kr/trao/main_trao.php 

http://ra-wiki.phys.utas.edu.au/index.php?n=Main.MtPleasant26m 

http://www.utas.edu.au/maths-physics/facilities /mt-pleasant-observatory 

http://radio.shao.cas.cn/xshd/201403/t20140318_164098.html 

http://www.astro.uni.torun.pl/index.php?page=facilities&tab=facilities& 
lang=en 


http://english.xao.ac.cn/pr/25mrt/ 
https: //venta.lv/en/science/ventspils-international-radio-astronomy-centre/ 
https: //irasr.aut.ac.nz/radio-telescopes 


http://www.oan.es/icts/info.shtml 


https: //en.wikipedia.org/wiki/Yevpatoria_RT-70_radio_telescope 
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Chapter 2 


Interferometry, Aperture Synthesis, 
and VLBI 


E. B. Fomalont and Melvyn Wright* 


Radio Astronomy Laboratory, University of California, 
Berkeley, CA 94720, USA 


*meluyn. wright@gmail.com 


This chapter reviews the theory and practice of interferometry and aperture syn- 
thesis for radio astronomy imaging and VLBI observations. Aperture synthesis 
combines the signals from arrays of radio telescopes to provide images with the 
angular resolution of the maximum separation of the antennas, with a field of 
view of the individual antennas. We review the process and mathematics for 
calibration and imaging formation, including the techniques of self-calibration, 
and combining array and single-dish observations to map large fields. We discuss 
pipeline processing of the data. VLBI using antennas with earth-diameter sep- 
arations can provide angular resolutions of ~20 micro-arcseconds at millimeter 
wavelengths, sufficient to image the event horizon of the black holes at the center 
of our own and M87 galaxy. We discuss the data acquisition and calibration of 
VLBI observations. We describe some recent developments with calibration and 
imaging in close to real time. 


Introduction 


Aperture synthesis enables us to map the sky brightness with subarcsecond 
resolution using arrays of radio antennas. Measurements of the cross correlation 
of signals between pairs of antennas sample the coherence or visibility function of 
the wavefront. If the dimensions of the radio source and the telescope array are small 
compared with the distance to the source, then the coherence of the wavefront is 
proportional to the Fourier transform of the intensity distribution of the source 


(Van Cittert—Zernike theorem, Ref. 1). 
A telescope array with N antennas provides N(N — 1)/2 cross correlations and 


N autocorrelations for each polarization product. Earth’s rotation of the projected 
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geometry of the telescope array in the direction of a celestial source provides addi- 
tional samples of the source visibility function in the aperture plane. 

Radio astronomy is an observational science. We make images of the radio 
intensity, I(s,v,p,t), as a function of angular position s, frequency v, polarization 
p, and time t. The image that is made, I’ = I * B + noise is the two-dimensional 
angular convolution with the instrumental response, B, plus an additive noise that 
comes from stochastic fluctuations, mostly from the receivers and atmosphere. 

We build antennas to collect radio photons from astronomical sources. The 
amount of power, P, we collect depends on the intensity, or brightness, of the 
radiation. If A(s) is the antenna collecting area in direction s on the sky, the power is 


P=S|W m~? Hz~*] x Av[Hz] x A[m?); (1) 
e.g. for a 1 Jy source in a 1 GHz bandwidth with a 100m diameter antenna, 
P[W] ~ 10776|W m~? Hz~*] x 10°[Hz] x 104[m?] ~ 1078 W ~ 107°" 5 /year, — (2) 


which would heat ~1 mg water 1 mK in 1 year!! 

The antennas also provide angular resolution. A circular aperture with diam- 
eter D with Gaussian weighting has a beamwidth, FWHM ~1.2/D. Antenna 
arrays allow us to separate the functions of resolution and collecting area. 
Antenna arrays sample the wavefront across a distributed aperture and measure the 
coherence of the wavefront across the array. Signal transmission from the antennas 
preserves phase. We must keep path lengths within ~\/20 to make an accurate 
telescope. Atmospheric fluctuations distort the wavefront. We can compensate for 
these effects, although this is quite difficult. Arrays of radio telescopes enable us to 
map the sky brightness using aperture synthesis techniques.* The antennas can be 
moved to provide different resolutions. High resolution needs extended array con- 
figurations with large antenna spacings. Large sources need compact arrays with 
short spacings. The Spatial Dynamic Range is the range of angular scales mapped, 
~AX/Drmin —A/Dmax, Where ~Dyyin and ~Dynax are the smallest and largest antenna 
spacings. Source structure >A/Dyin or <A/Dmax is not mapped. 

We need to know something about the sources of radio emission in order to 
design our telescopes and observations. Some of the first observations are usually 
surveys to determine the distribution and nature of the sources. Later observa- 
tions study the details of sources, or classes of sources. Telescopes and observing 
techniques together define a matched filter to a set of possible observations to deter- 
mine the characteristics of I(s,v,p,t). Some observations are a good match to the 
instrument. Others are more difficult: “challenging” or “impossible”. We design new 
instruments to match new science goals. 


2. Aperture Synthesis Fundamentals 


A summary of the processes that convert the wavefront coherence to astronomical 
images follows (see Fig. 1). The source has an intensity distribution in the sky, I(c), 
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Fig. 1. Data flow for radio telescope arrays. Analog signals from each antenna receiver and polar- 
ization are converted to baseband (X) and digitized. At high frequencies, amplifiers may not be 
available, and the signals from the antenna receivers are first down-converted before amplification. 
Several stages of down conversion and amplification may be used before the signals are digitized. 
At low frequencies, the signals from the array receivers may be directly sampled and digitized. 
The sampler clock provides the coherent signal for the down conversion. The sampled bandwidth 
is analyzed into frequency channels using polyphase filterbanks (F). The digital data are routed 
through multicast switches into correlators, and into beam formers to form phased array beams 
at multiple points in the sky. The data from the correlators are calibrated by comparing the 
measured cross correlations with observed astronomical calibrator sources to derive instrumen- 
tal gains as functions of time, frequency, and polarization. Images of the target sources can be 
used to improve the calibrations (self-calibration). The calibrations are used in the Beam Former 
to produce phased array beams, which can be used in spectrometers, pulsar analysis engines, 
transient source observations, VLBI instrumentation, etc. Analysis of the images can be used to 
optimize the observing parameters; for example by adjusting the frequency coverage and reso- 
lution of detected spectral lines, or by modifying the pointing coverage of mosaic observations. 
Real-time analysis of the images can also be used to detect transient sources and targets for the 
beam-formers. 


(ee Ss ee Ge i ee ee 


and we will assume that the array antennas are identical and that their sky response 
is given by A(o — o’) where o’ is the pointing direction and the area of sky with 
significant sensitivity is small. Thus, the apparent source emission is the product 
Ix A. 

The fundamental array responses are the signals, S;(¢t), which are the volt- 
ages detected from each array antenna, 7. The correlations of the voltages between 
antenna pairs (7,7) produce what is called the visibility function and it is closely 
related to the wavefront coherence. The amplitude calibration of the visibility 
amplitudes converts the voltage/power units to power, P, of the radio source. 
The calibration of the visibility phase produces a virtual focus of the signals from 
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all antennas. A typical experiment generally lasts for many hours in order to detect 
faint signals from radio sources, and to increase the synthetic aperture coverage. 
Finally, the conversion of the calibrated visibility to the image, I’, is obtained by a 
Fourier transform. 


2.1. Correlation of Antenna Signals 


Digital cross correlators compute cross-power frequency spectra from each antenna, 
S;(t,v,p) for all pairs of antennas in the telescope array to determine the complex 
visibility function V;_;(t,v,p). Since the signals from celestial radio sources are typ- 
ically much weaker than the uncorrelated noise power from sky and radio receivers, 
the measured cross correlations are time-averaged to enhance the signal-to-noise 
ratio. 

The major time variation of the visibility function for an antenna pair is pro- 
duced by the diurnal motion of the celestial source. Given the antenna pair sepa- 
ration b = (bj — b;) and a fiducial point near the celestial source called the phase 
center, 09, the term 


2nVvi 


exp b- ao(t) (3) 


Cc 


is called the correlator model for the observation. 

The digital correlators are peta-op, special-purpose computers. The expanded 
very large array (EVLA) correlator* cross correlates all pairs of antennas with up 
to 16 GHz of bandwidth with a minimum of 16,384 spectral channels in 64 full 
polarization, independent spectral windows. The Atacama large millimeter array 
(ALMA) correlator? processes 16 GHz of bandwidth for the 2016 pairs of antennas 
and 4 polarization products. The basic operation is a complex-multiply and add 
operation. The complex multiply is typically 4 x 4-bit with accumulation into 32- 
bits at rates ~10!%s~!. Large digital correlators built using custom ASICS take 
5-10 years to develop (e.g. ALMA;° EVLA®). 

Time-averaged correlation data are written to a data archive for off-line data 
processing. The data rate from the EVLA correlator can be up to 350GBs~!. Only 
a few percent of this data rate can be handled by the off-line data processing. 
The current plan for ALMA is an average data rate ~6 MBs~! and a peak rate 
60 MBs~!.” Even so, users will be faced with the prospect of dealing with several 
terabytes of data for EVLA and ALMA observations (EVLA;® ALMA‘). 


2.2. Calibrating and Editing the Visibility Function 


The observed complex visibility function, Vei(t, v,p) produced by the correlator 
contains imperfections contributed by the effects of the multitude of media through 
which the radio signal or converted voltage signals propagate. Some are: propa- 
gation through the ionosphere and troposphere above each antenna; the rotation 
and leakage of the two antenna polarizations with respect to the radio source; the 
angular voltage pattern of each antenna; the temporal gain, path length, and phase 
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change in the electronics; the relative frequency-dependent gain and phase (band- 
pass); and the path-length errors associated with incorrect array parameters. These 
imperfections are antenna-based, so that the true visibility function, V7, is 


Vij, (t.v,p) = Gilt, v, p) G;(t,v, p) Voi (t, YP). (4) 


Baseline-based corrections g;,;(t,v,p) are generally small and are often caused by 
nonlinearities in the correlator processing or second-order effects from large antenna- 
based gain effects. Details of these correction terms are given in Refs. 3 and 9. 

The a priori calibration of the visibility function can be made using many 
internal electronic measuring systems. Some examples are: the measurement of the 
antenna gain and noise level by injecting a known signal in the feed of each antenna; 
determination of variable path lengths in each antenna using special narrowband 
signals; removal of correlator nonlinearities and effects of digitization. 

Post-priori calibration of the visibility function is made by selected observa- 
tions of calibrator sources which have a known intensity, position, polarization and 
frequency spectrum. Many of the sources are point-like quasars, although they 
are often time variable in strength and polarization. For example, a point cali- 
brator with known parameters F'(t,v,p) and accurately known position will have 
Vii (t, v,p) = F(t,v,p) for all baselines. Assuming that there are no baseline-based 
corrections, the antenna gains G;(t,v,p) can be determined. 

These calibrations are separated into two main types: pseudo-stable calibrations 
and temporal calibrations. An example of a pseudo-stable calibration is the band- 
pass calibration. A strong quasar is observed for a sufficient amount of time so that 
the signal-to-noise in each frequency channel is sufficient large. Assuming that the 
spectrum of the quasar is known, the relative frequency response of each antenna can 
be derived. Another example is the antenna position, or baseline calibration. Short 
observations of many quasars are observed over the sky, perhaps 50 observations 
in 30 minutes. Any visibility phase variation of the observations as a function of 
source declination and hour angle can be fit to an offset of the antenna positions 
used in the correlator. Other specialized observations can determine the polarization 
properties and the antenna-beam properties. These properties are relatively stable 
so that observations only once a week or month, or after a major change in the 
array, are needed. 

The main temporal variations in visibility amplitude and phase are caused 
by fast instrumental changes (generally removed by internal calibrations or fixing 
broken antenna systems), and mainly from the troposphere and ionosphere path 
length changes over each antenna. Because these variations also depend on the 
direction to the source, the calibrator should be as close to the source of interest 
as possible. This mode of observing, where fast switching observations between the 
calibrator and target are made, is called phase referencing. 

The above quasar calibration assumes that one strong calibrator can be isolated 
within the antenna beam area. This is generally true at frequencies greater than 
20 GHz with antenna size larger than about 7m. For lower frequencies arrays or 
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those in which antenna elements are connected before correlation, confusing sources 
may be in the sidelobes of the primary beam, or in different isoplanatic regions, 
meaning that the phase correction changes significantly over the primary beam. 
These problems are discussed in Section 3.3. 

The data quality from an array is not perfect. Occasionally, malfunctions and 
serious instabilities in the electronics of an antenna occur. These (usually) antenna- 
based problems may be known before the observations and the offending data 
flagged before any analysis. The above calibrator observations to determine the gain 
and phase of each antenna will also determine the times when data are corrupted. 
These include, e.g. low signal level from an antenna, phase instability much larger 
than that from nearby antennas, and large variable signals in narrow frequency 
channels that are indicative of radio frequency interference (RFT). 

The basic calibration computation is a complex-multiply of the measured cross- 
correlations (uv data) for each data sample and frequency channel. The calibrations 
can be stored in data structures and applied when the uv data are plotted, analyzed, 
or imaged. In Fig. 2, we plot the computation time for calibrating multi-channel wv 
data versus the number of wv data samples in an off-line simulation for ALMA data 
with 60 antennas in a 4.5km configuration. Figure 2 shows that the off-line calibra- 
tion time is proportional to the number of wv data samples. We used the MIRIAD 
data reduction package,!? which uses a streaming data format. The complex-valued 
uv data were represented by 4 bytes per frequency channel with a scaling factor for 
each multi-channel data sample. The 4-byte representation of the Nehan allows a 
1:32,000 spectral dynamic range for each multi-channel data sample. Including the 
time-variable meta-data which describe the data, the telescope, and the observa- 
tions, the total length was 460 bytes for a 100-channel data sample. The calibration 
rate was 6 Mbytes s~!, showing that the average data rate currently allowed for 
ALMA could be calibrated in a single pipelined process on a standard rack server, 
and that much higher data rates could be supported in multiple threads on a modest 
sized cluster. Further gains in computing efficiency are clearly possible. Off-line data 
reduction typically uses static “measurement sets” with the wv data represented as 
8- or 16-byte complex values. An astronomer using off-line data processing typically 
keeps several copies of calibrated and uncalibrated wv data, with each step requiring 
reading and writing the uv data. 

In a real-time imaging pipeline, the calibrations are derived from, and applied 
to, the data streams from the correlators (see Fig. 1). RFI must also be subtracted 
from the data stream before it is passed to the imaging engine and beam formers. 
RFI presents a special case in several ways. RFI sources may be either station- 
ary or moving across the sky at a non-sidereal rate. A correlator can be used to 
locate and characterize RFI as a function of time, frequency and polarization. The 
signal-to-noise can be improved by pointing some of the antennas or beam formers 
at the RFI sources. Correlators allocated to measuring RFI may need to sample the 
signal at high data rates. For phased array telescopes, the station beam can form 
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ALMA imaging with 60 antennas in 4.5 km configuration Oct—2017 
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Fig. 2. Multi-channel imaging using simulated MIRIAD data for ALMA with 60 antennas in a 
4.5 km configuration, giving an angular resolution of 80 milliarcseconds at 230 GHz. The dashed 
line shows the time for applying the antenna-based gains and bandpass calibrations to the uv data, 
or for subtracting the sky brightness model from the uv data. The imaging step (solid line) applies 
the weights to the wv data, convolves the calibrated wv data onto a gridded uv plane, and uses 
an FFT to make the multi-channel image and synthesized beam. Off-line calibration and imaging 
is typically made in several steps. In a real-time pipeline, these steps can be made in sequence on 
the data stream. 


nulls at the position of (moving) RFT sources. Accurate calibration of the array 


antennas is required in real time.!! 


2.3. Earth Rotation Aperture Synthesis 
The Van Cittert—Zernike theorem states that the Fourier transform of the angular 
intensity distribution on the sky, I(c), is given by the angular coherence of the 


wavefront, given by the calibrated visibility function, V(t), observed from an array 
of antennas. For a short observation, the relationship is 


bij ‘oO do, (5) 
c 


where o’ is the pointing direction of the antennas (assumed identical) in the array 
and bj; is the separation between the two antennas. A telescope array with N 
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antennas provides N(N — 1)/2 cross correlations and N auto correlations for each 
polarization product. The Earth’s rotation of the projected geometry of the tele- 
scope array in the direction of a celestial source provides additional samples of the 
source visibility function in the aperture plane.”»? The more dense the sampling in 
the aperture plane, the more accurate will be the determination of I’. The EVLA 
with 27 antennas,° and the ALMA with 64 antennas!? represent the current state 
of the art aperture synthesis telescopes at centimeter and millimeter /submillimeter 
wavelengths, respectively. 

Given 9 as the correlator phase tracking center, and s = 0 — og is the angular 
vector from the phase center, then the Fourier transform becomes 


Qrvi 2rvi 
V(t) =exp bi; - ao(t) / L165) A) exp —“* bi; | ds. (6) 
If s is small, we can write this as a 2D Fourier transform: 
2rV1 
ven) =f [Le Bonew ur + y)], dedy (7 
ry 


where x is the sky projection in the east-west direction, and y in the north— 
south direction from the phase center; u is the baseline projection in the east—west 
directions, and v in the north-south direction. More generally, s = (x,y,z) and 
b = (u,v, w), where z = \/1—2?—y?, w = bij - o0(t), and the exponential term 
becomes wa +vy+ w(1—z). In place of a three-dimensional (3D) Fourier transform, 
faceting is used to image large angular-scale sources piecemeal, where the value of 
z is sufficiently smaller, and the subimages are then combined. 

Most observations cover a finite bandwidth, but the above relationships assume 
a constant frequency v. If Av/b;j -oo(t) > 0.1, then we must use a small range of 
vy to produce an image in a limited frequency range in order to avoid bandwidth 
smearing. A multi-frequency synthesis (MFS) method can be used to produce a 
single image from a wideband data set, and this is described in the imaging section. 

The calibration of aperture synthesis arrays at meter wavelengths presents 
formidable problems. The wide fields of view of the telescopes are full of radio 
sources that confuse the regions of interest. The antennas have direction-dependent 
response over the field of view, and the ionosphere can cause direction-dependent 
phase shifts on short-time scales. LOFAR is a Low Frequency Array telescope with 
antennas at 77 stations spread over 100km and observing in the frequency range 
30-90 and 120-250 MHz. Data from the antennas at each station are combined into 
phased array beams to reduce the data rate to a single data stream for each station. 
Correlation of the station beams is made in a 34 TFlop, IBM BlueGene/L super- 
computer. LOFAR calibration and imaging are made in pipelined data processing 
with RFI flagging, calibrated using a model sky brightness model.? The Murchi- 
son widefield array (MWA) being built in Western Australia was designed as a 
512-antenna array to observe in the frequency range 80-300 MHz. The correlation 
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data would comprise 130,000 cross correlations with 768 frequency channels and 4 
polarization products.'+ The data rate ~19 GBs! is impractical to store; data will 
be calibrated and imaged in a real-time pipeline with images of the sky produced 
at 8s intervals. The real-time calibration pipeline processing is discussed in detail 
by Ref. 15. The MWA has been de-scoped to a 128-antenna array, which reduces 
the data rate by a factor of 16. 

The square kilometer array (SKA) survey science requires images with superb 
image quality, which imposes stringent requirements on the calibration at several 
stage of beam formation. A major theme driving the SKA design is the high cost 
of data processing.1© 70 


3. Aperture Synthesis Imaging 


The inverse Fourier relationship that gives the apparent source image, J’, in terms 
of the complex visibility function is 
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P64) = [wr) V(u,v) exp (ux + vy) dudv, (8) 
where a weighting term W is included. This can be used to minimize the noise of 
the image, or to improve the smoothness of the density of points in the wv plane. 

Because the visibility function is measured only at discrete points, this integral 
is replaced by a summation at the (uz, vg) sampled points: 
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I'(x,y) & » W (ur, Vr) V (ur; Ue) exp (upx + URY). (9) 
k 


It is useful to define the synthesized beam (point-spread function) for the observa- 
tion set by 
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B(x,y) = 5) W (ux, vn) exp (upx + URY)- (10) 
k 


Cc 


In summary, if I(s, v, p,t) is the source intensity distribution, illuminated by the 
primary beam pattern, A(s,v,p), then the Fourier transform of the visibility data 
is I'(s,v,p,t) and is related to the true source intensity by I’ = [I x A]* B, where x 
denotes convolution. Notice that the synthesized beam B contains the convolution 
effects of the weighting term W. 

The above determination of J’ is called the direct Fourier transform, since it 
simply adds the measured visibility functions, with the proper phasing and adopted 
weighting. However, for a large number for visibility functions and a large imaging 
area, the direct Fourier transform is too slow, even with modern computing power. 
The standard imaging algorithm uses an FFT of the gridded uv data to obtain the 
image on a rectangular grid in the sky. 
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3.1. Gridded Fast Fourier Transform 


In order to use a fast Fourier transform algorithm, we re-sample the uv data onto 
a gridded uv plane. The uv data are multiplied by the weighting function W, con- 
volved by a gridding function C’, and re-sampled onto a regular grid by II: 


FFT 
(Vx W)xC] xT” = °[(7 x A)* B) xc] «Il. (11) 


Thus, the Fourier transform of the gridded uv data is an image of the sky brightness 
distribution J multiplied by the primary beam pattern, A, convolved by the synthe- 
sized beam B, multiplied by c, and convolved by I. The convolution by LI replicates 
the image at intervals 1/éuv, where duv is the sample interval of the gridded uv 
data. Aliasing in the sky brightness image is minimized by choosing a function C, 
so that its Fourier transform c falls to a small value at the edge of the image. 


3.2. Deconvolution 


The main distortions of the images are from their convolution with the point-spread 
function B, with secondary affects from effects of gridding. Two different deconvolu- 
tion algorithms are commonly used: an iterative point source subtraction algorithm, 
CLEAN, which is well matched for deconvolving compact source structures, and 
MAXIMUM ENTROPY, a gradient search algorithm, which maximizes the fit to 
an a priori image in a least squares fit to the gridded uv data. Both algorithms 
operate in the image plane on the synthesized image and beam. 

Subtracting the sky brightness model, as CLEAN or MAXIMUM ENTROPY 
proceed, from the original uv data minimizes the distortions associated with the 
gridded FFT. This direct subtraction is also needed for sources with large angular 
size, especially those with outlier sources well outside of the primary beam where 
position-dependent calibrations and time variability may occur. For the MIRIAD 
implementation, see, e.g. Ref. 21. For the AIPS implementation, which uses the 
Cotton—Schwab algorithm, see Ref. 22. 

In Fig. 2, we plot the time for a gridded FFT in MIRIAD for a multi-channel 
image with 1580 x 1580 pixels and 100 frequency channels. The multi-channel data 
are gridded and imaged as a vector with a common pixel size, gridding convolution 
function, and synthesized beam. Figure 2 shows that the imaging time is propor- 
tional to the number of wv data samples, at a rate of ~4 Mbytes s~! using a single 
processor. Image deconvolution time scales with image size and complexity, and is 
CPU intensive. A direct deconvolution, dividing by the Fourier transform of the syn- 
thesized beam, cannot be used because the Fourier plane is not completely sampled. 
Both CLEAN and MAXIMUM ENTROPY are iterative algorithms, using FFTs of 
the image and synthesized beam. A 1580 x 1580 x 100 channel, real-valued image 
(4-bytes per pixel) ~1 Gbyte, with a common synthesized beam (10 Mbyes), can 
be deconvolved in memory. The frequency channels can be deconvolved in parallel 
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processes. Image deconvolution is relatively fast for compact image structures, but 
can exceed the imaging time for complex images. 

In a real-time imaging pipeline, images in multiple frequency channels can be 
processed in parallel in a distributed architecture. The images are formed from the 
calibrated data stream from which the a priori sky model has been subtracted, and 
are therefore difference images from the sky model. 


3.3. Self-calibration 


The difference images are used to update the sky model, including not only the 
regions of interest, but also improving the accuracy of sources whose sidelobes must 
be subtracted. As the observations proceed, both the model image and the calibra- 
tion are improved. The process converges when the difference images approach the 
noise level and the model image is consistent with the data. 

For a small field of view, a 2D FFT can be used to image the region around 
each phase center. The maximum image size for a 2D FFT scales as Dinax/A, ~108 
beam areas on a 1000km baseline at 1cm wavelength. Deconvolution is minimized 
by obtaining good uv sampling of the aperture plane and low synthesized beam 
sidelobe levels for large N array designs. For example, for the ALMA array with 60 
antennas, the sidelobe levels are ~1%. In many cases, deconvolution in the image 
plane may not be needed, since the model image and sidelobes of confusing sources 
have been subtracted from the uv data. In addition, images may be limited by 
atmospheric and instrumental errors, which must be removed from the wv data and 
cannot be removed by deconvolving in the image plane. 

The imaging engine can make images using all the frequency channels. Spectral 
line images can be made for multiple frequency channels, averaged into the desired 
frequency or velocity intervals. Wideband, MFS imaging treats the frequency chan- 
nels as independent wv samples. The a priori model used in the calibration can be 
updated at intervals, when the difference from the best current image is significant. 

Variable sources are detected as intermittent sources which are inconsistent 
with the current model. We should also accumulate a \? image to help identify 
pixels where there are time-variable sources or RFI sources. In some cases we may 
want to keep a time series of difference images and the model images used for the 
calibration. 

We view imaging as a dynamic process that can be guided in real time by 
observers inspecting the convergence of the model image and the \? image. As 
the observations proceed, the observations can be moved to regions where more 
data are needed to define the science goals, either regions of interest, or sources 
whose sidelobes are confusing, or new sources that are discovered in the imaging 
process. Isoplanatic patches may vary during the observations, requiring different 
observation centers to adequately determine the calibration across the sky. 

The data archive serves as the database for the observations, calibrations and 
instrument status during the observations. The data streams from each phase center 
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are saved in the data archive, along with the metadata. Data from the data archive 
can be replayed though the imaging system so that the best model of the sky 
and calibration data from the completed observations can be used to improve the 
calibration of the final image and extract time-variable sources. 


3.4. Properties of Synthesized Beam 


The angular resolution is ~A/2Dmax, Where Dyax is the maximum antenna separa- 
tion. Note the factor of 2; since V(—u, —v) = V*(u,v) (the Fourier transform of a 
real-valued I'(x, y)), the width of the aperture is 2D max. The discrete equally-spaced 
sampling of the aperture plane causes aliasing of source structure on scales >/Dinc, 
where Dine is the sample interval in the (u,v) data. This aliasing is reduced if the 
gridding of the uv data is a convolution using specialized functions. 

Each antenna pair samples the Fourier transform of the source brightness dis- 
tribution. A radio source is “resolved” if its size is 2\/2Dmin, where Dyin is the 
shortest antenna separation. The shortest wv spacing that can be measured is greater 
than the dish diameter Dani; otherwise antennas would collide or be shadowed. This 
hole in the center of the aperture plane is known as the “short-spacing” problem. If 
the size of the emission is >\/2Dmin, where Dyin is the shortest antenna separation, 
it will not be present in the image. On the other hand, small radio components that 
are separated by more than this separation will be included in the image. 

The field of view of the antennas ~\/Dant. Each antenna configuration is sensi- 
tive to a range of angular sizes, \/ Dinax —A/D min. We must select sources with struc- 
tures between \/2Dant and A/2Dmax in order to image with a single pointing. Larger 
structures can be imaged using multiple pointing centers in a mosaic. The primary 
beam pattern illuminates the field of view. The primary beam for an interferometer 
is the cross power pattern for each antenna pair (i, 7), Ai,j(s) = Vi(s) x Vj*(s), and 
may be different for each antenna pair (i, 7). 


3.5. Combining Single Dish and Interferometer Observations 


Single-dish observations samples Fourier spacings from 0 to Dan. The total power, 
or zero spacing, gives the total flux in the image. An interferometer array samples 
spacings from Dyin to Dax. A more completely sampled image is obtained by 
merging single dish and interferometer images. The single-dish image is I x Bginele, 
where J is the sky brightness, and Bgingic is the single-dish beam. The interferometer 
array image is apodized by the primary beam of the array antennas, so to combine 
the observations we must deconvolve the single-dish observations (we need to know 
the single-dish beam pattern), and multiply by the interferometer image primary 
beam pattern. We can establish a consistent calibration between single dish and 
interferometer by sampling overlapping Fourier spacings between the single dish 
and interferometer array observations. The ideal single dish has Dant ~ 2  Dmin, 
as we need overlap for calibration. 
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The single dish and interferometer array data can be combined using a weighted 
average (IMMERGB), or using the mosaicing algorithms discussed below. The 
MOSAIC imaging algorithms allow joint deconvolution of an interferometer mosaic 
image and a single-dish image, or can use the single-dish image as a default image 
for spatial scales that are unsampled by the interferometer data. 


3.6. Data Analysis 


In this section, we provide two examples of analysis of aperture synthesis data 
that would benefit from calibration and imaging in close to real time, so that the 
appropriate observations and calibrations are obtained. Our first example is imaging 
the Cyg X-3 region with the Allen Telescope Array at 3.1 GHz (Peter Williams, PhD 
thesis; see Fig. 3). Cyg X-3 is a high-mass X-ray binary system that can increase its 
brightness by a factor of ~10 in an hour. Subtracting the large-scale structure allows 
us to get high time-resolution light curves of time-variable sources. The off-line data 
processing took several hours, and limits our ability to image time-variable sources. 

The second example is imaging Sgr A using data obtained with the ALMA tele- 
scope in Band 3 (84-116 GHz), Band 6 (211-275 GHz), and Band 7 (275-370 GHz); 
see Fig. 4. These observations used a single pointing centered on Ser A*, the compact 
radio source associated with the black hole at the Galactic center. The observations 
used a single configuration of ALMA 12m antennas, with antenna spacings from 
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Fig. 3. Left: Image of the Cyg X-3 region obtained with the Allen Telescope Array at 3.1GHz. 
Right: Image of compact, time-variable sources after subtracting the complex structure. Cyg X-3 
is a high-mass X-ray binary system that can increase its brightness by a factor of ~10 in an hour. 
Subtracting the large-scale structure allows us to get high time-resolution light curves of time- 
variable sources. These images were made using the CASA widefield imager, which is an order 
of magnitude slower than MIRIAD, but handles the large field of view more correctly. (Images 
courtesy of Peter Williams.) 
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Fig. 4. Sgr A image using the ALMA telescope. Contours in log scale from 1 to 64 mJy/beam with linear gray scale. The blue contours indicate 
the primary beam pattern at the 90, 50, and 10% levels. Left: Band 3. The synthesized beam was 1.5 x 1.2’” at PA = —87°. Center: Band 6. The 
synthesized beam was 0.64 x 0.49” at PA = —84°. Right: Band 7. The synthesized beam was 0.43 x 0.36” at PA = +89°. 
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13 to 400m. After the standard data calibration using nearby quasars, Sgr A* 
can be used to self-calibrate using only the longer antenna spacings where the more 
extended structure shown in Fig. 4 is resolved away. Analyzing the large-scale struc- 
ture illustrates two problems with array observations over a wide frequency range: 
the wv range, and the primary beam patterns scale with frequency, so we cannot 
directly compare the images to estimate the spectral index of the radio emission. The 
Band 3 image clearly maps larger-scale structure which is not present in Bands 6 
and 7. Conversely, Bands 6 and 7 sample larger spatial frequencies and map smaller 
scale structure which is not imaged in Band 3. In order to map the spectral index 
of the extended structures, we must compute the ratio of the intensity, /J2, at 
multiple frequencies. If we directly compare the images of Sgr A, what we actually 
measure is (I, — I{)/(I2— 5), where J} and Jj are the missing, unsampled structure 
at the two frequencies. 

One approach is to image a common range of wv spacings measured in wave- 
lengths. In this case, we can compute the spectral index of a common subset of 
angular scales. For these data, this approach did not give good results because the 
uv distribution is much better sampled at short spacings at low frequencies. We tried 
to mitigate this by using a uniform weighting of the uv data, still using the same wv 
range for all bands, and the same restoring beam, FWHM after deconvolution. This 
pulls up the mapped intensity, measured in Jy/beam, at the higher frequencies, but 
only partially offsets the higher density of uv samples at lower frequencies at short 
spacings. 

Another approach is to find a model image which is consistent with the sampled 
data within the measurement errors. For this, we used an MFS image at Band 6. 
We used the CLEAN algorithm described above to make a deconvolved model. 
When we subtract this model from the Band 6 uv data, the residual is consistent 
with the data that were sampled in four spectral windows at 245, 246, 260 and 
262 GHz. When we subtract the model from Band 3 and Band 7 data, we must 
correct for the different primary beam patterns; i.e. we correct the Band 6 model 
for the Band 6 primary beam pattern, and then multiply by the Band 7 or Band 3 
primary beam before subtracting the model from the Band 7 and Band 3 uv data. 
We can use a different primary beam for each spectral window to handle the wide 
bandwidth, and could use a different primary beam pattern for each antenna pair if 
these are known. In the subtraction, we can introduce a scale factor and adjust this 
to minimize the residuals after subtracting the scaled model. The scale factor that 
minimizes the residual images gives us an estimate of the overall spectral index. 
For the Band 7 data, this process worked well. The residuals were consistent with 
the measurement errors, including a ~10% estimate for primary beam uncertain- 
ties. Some of the ALMA antennas suffer from astigmatism with unknown time, 
temperature, elevation, etc. dependence, which we could not include in the primary 
beam model. These problems could be handled by primary beam measurements and 
data processing in close to real time. The derived scale factor that minimizes the 
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residuals corresponds to an overall spectral index of S x v~°:!, which is expected 
for optically thin thermal bremsstrahlung emission; i.e. there is no evidence for dust 
emission on the subarcsecond angular scales measured in these data (an interesting 
conclusion). For the Band 3 data, the primary beam is larger than in the Band 6 
model, so the primary beam-corrected model has large, unknown errors, and the 
subtraction from the wv data leaves significant residuals. No additional conclusions 
were obtained from this subtraction. 

In order to map the spectral index distribution in Sgr A, additional observations 
with the ALMA compact array (ACA) and single-dish observations are required to 
sample the larger-scale structure, with multiple pointings of the ALMA array and 
contemporary primary beam measurements to make mosaic images. This problem 
of finding a spectral index with missing uv coverage with the ALMA telescope is a 
foretaste of what will also be needed for the SKA. 


3.7. Mosaicing 


In order to map sources larger than the primary beam, we need to make multi- 
ple pointings of our array antennas to map the regions of interest. There are two 
sampling problems: (1) mapping a large field of view and (2) imaging large-scale 
structures: 


(1) Mapping a field of view larger than primary beam with an antenna array is 
the same process as single-dish mapping. We scan antennas across the source 
and sample the data at the Nyquist interval. The array primary beam patterns 
A(x, y) convolve the sky brightness distribution I(x, y), and sample at intervals 
60 by II: 


I'(x,y) = (x,y) * Ala, y) x I(@, y). (12) 
The Fourier transform gives the visibility data sampled in the wv-plane: 
V'(u,v) = V(u, v) x a(u, v) x (u,v), (13) 


where a(u,v) is the weighting of spatial frequencies sampled by the primary 
beams. The sampled visibilities are convolved by 7, the Fourier transform of II. 
The wv data are aliased if 7 < 2Dant/A, so we sample antenna pointings with 
60 < A/2Dant to avoid aliasing uv-data. 

(2) Imaging large-scale structures: The interferometer array image I’(, y) is 


—2771 


I'(2,y) = [uv ux) exp (ua + vy) dudv (14) 


= I(x,y) x A(x, y) * B(a,y). (15) 
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In the Fourier plane, the measured visibility V’(u, v) is 
V'(u,v) = V(u, v) x a(u,v) x W(u,v), 


where W(u, v) is the sampling in the wv plane. a(u, v) is the weighting of spatial 
frequencies sampled by the antennas. The image is convolved by B(a,y). The 
image is aliased if dwv > Dant/2A, so we need to sample uv data at duu < 
Dant/2A. On-the-fly mosaicing may be needed to map large sources with ALMA 
because the small field of view of the antennas requires a large number of 
pointings. 


4. Very Long Baseline Interferometry and Geodesy 


Very long baseline interferometry (VLBI) follows the same principles as connected 
interferometers. The primary difference is that the signals from each antenna are 
usually not correlated in real time. They are stored on a high volume medium and 
shipped to a central point where the signals are correlated, perhaps many days 
after the experiment. There is some use of e-VLBI, where the antenna signals are 
sent directly to the correlator, but these are currently limited to low bandwidth 
observations, and used as a check at the start of an experiment. The reason for 
the post-observing correlation is that the array elements are generally thousands of 
kilometers apart, enabling sub-milliarcsecond resolution of celestial objects. Other 
differences are that the antennas and electronics may be different at different loca- 
tions, the local atmosphere conditions often differ, and the source rise and set times 
are significantly different. 


4.1. VLBI Projects 


Ultra-high resolution is critical for the following projects: 


e Precise absolute positional accuracy of compact quasars are now approaching 
30 ps. These require accurate knowledge and determination of the location of the 
antenna platforms, which requires knowledge of the earth rotation and orienta- 
tion, crystal dynamics, polar motion, and quasar changes. 

e The inner dynamics of the jets and cores of distant quasars can be followed over 
weeks to years of time, and are often associated with optical and gamma-ray 
flares. 

e The parallax of any sufficient bright object in our galaxy is within reach of VLBI 
observations. 

e The trajectories of orbiting stars can be determined, and often lead to accurate 
mass determinations. 

e The evolution of young supernovae can be followed. 
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e Some of the most successful VLBI observations are the mapping of motion and 
acceleration of megamasers in the disk of some galaxies. These provide a direct 
distance measurement and recession velocity to determine the Hubble constant. 

e Tests of general relativity have used VLBI observations, and VLBI observations 
have been used for spacecraft orbit determination and navigation (e.g., Refs. 23 
and 24). 

e The Event Horizon Telescope uses existing millimeter wavelength telescopes to 
image the radiation from the black holes in Sgr A* and Virgo A. Using VLBI 
at a wavelength of 1.3mm provides ~20y arcsecond resolution, enabling us to 
map regions close to the event horizon. Telescope arrays at CARMA and SMA are 
used as phased arrays to provide VLBI stations that are used in combination with 
single aperture telescopes in Arizona, Mexico, and Europe. Future observations 
will also use ALMA as a phased array, increasing the sensitivity, and at the 
South Celestial Pole, increasing the resolution and wv coverage. This global array 
has the capability to measure time-variable emission in orbit around the black 
hole. 


4.2. VLBI Correlation 


The correlation of data transported from antennas many kilometers apart requires 
accurate frequency standards at each location. These frequency standards, usu- 
ally hydrogen masers of accuracy one part in 10° over an hour, are used (1) to 
down-convert and sample the radio signals to lower frequencies, limited by the 
bandwidth of the observations, that can be more compactly transported, and (2) 
provide an accurate time stamp, tied to GPS, associated with the antenna data 
stream. Recently, the use of cryo-Cooled Sapphire Oscillators (CSO) has demon- 
strated even better stability on short timescales. 

There are two critical parameters that are needed in the correlator in order to 
obtain visibility functions that can be calibrated. First, in order to determine the 
residual clock offset between two antenna locations, the delay must be less than the 
reciprocal of the narrowest frequency channel. For example, if the total bandwidth 
of the signal is 1 GHz, split into 1000 frequency channels, then the phase change 
over each 1 MHz channel will be one revolution for a timing error of 1 ws. With GPS 
accuracies better than 0.1 pus, this frequency resolution will be sufficient. 

Second, fast sampling of the visibility function may be needed if the rate of 
change of phase is large. For example, an uncertainty of 20 cm in the separation 
of the antennas will produce a phase rate at 300 GHz (A = 1mm) of 6 degrees per 
second, whereas the short-term phase jitter of the hydrogen masers and tropospheric 
fluctuations can reach 10° s~!. For VLBI arrays with elements separated by 8000 km, 
an error of the earth orientation of 1 x 10~*s or a quasar position error of 0.01” 
will produce an effective antenna separation error of 40cm. 

The requirement of one second sampling with 1 MHz narrow channels is now 
achievable using software correlators.!? 14:25:26 Because of the relatively small 
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number of antennas and the relatively simple systems, a digital correlator, called 
DIFX, is now used for most VLBI correlation.?” 


4.3. International Celestial Reference System 


The calibration of VLBI observations has become relatively straightforward over 
the last ten years because of a 30-year world-wide VLBI observation program, see 
http://aa.usno.navy.mil/faq/docs/ICRS_doc.php. The radio positions of about 400 
defining quasars that cover the sky define the celestial reference frame to about 
10 us. Continuing observations every week using a variety of VLBI arrays provides 
the earth orientation, rotation and polar motion parameters that are necessary for 
the correlation of VLBI data described above. 


4.4. Calibrations 


The internal calibrations, described above for connected interferometers, are also 
needed for VLBI observations. Some of the corrections are associated with the VLBI 
systems. Updated VLBI parameters, if not included in the correlator model, should 
be applied to the visibility phase data. 

The main pseudo-temporal phase calibration needed for VLBI observation is the 
phase referencing between a quasar that is relatively close to the target of interest. 
However, because of the relatively large delay and phase rate for VLBI observations, 
the phase, temporal phase rate and linear phase change with frequency (delay) are 
determined every two to ten minutes using a relatively bright quasar. The phase 
parameters are then applied to the target and its visibility function is then well- 
enough calibrated in order to produce images in the normal manner. 

The amplitude calibration is somewhat more difficult than that for connected 
interferometers. Besides the correlations for the receiver sensitivities, subtle ampli- 
tude corrections are needed for the correlator processing because of the several levels 
of digitization. These calibrations generally produce an absolute calibration from 
visibility units to power units to about 10% accuracy. 

Finally, very few quasars are unresolved and they cannot be used as the absolute 
gain calibrator. Often, only a part of the quasar’s flux density is detected with the 
VLBI observations, since the more extended emission may be completely missing 
with typical shortest spacings of 500 km or more. By using self-calibration tech- 
niques, discussed above, the quasar image can be obtained and correction for its 
loss of correlation amplitude for the longer baselines can be made. This will remove 
temporal gain changes and those between antennas, but a priori calibrations are 
needed for the absolute gain calibration. 


5. Current Developments 


In this section, we describe some examples of synthesis imaging telescopes and 
data processing. The large data volumes and sophisticated data reduction present a 
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burden for off-line data processing, and create a time delay which precludes feedback 
from the observations into the data acquisition. Off-line data reduction pipelines 
have been developed that make the telescopes more accessible to non-expert users. 
The very large array (VLA) calibration pipeline performs basic flagging and calibra- 
tion using CASA scripts, including bandpass and flux calibration, and RFI flagging. 
The ALMA Science Pipeline performs automated data processing, including cali- 
bration and imaging. Users have access to the raw data, the pipeline scripts used to 
calibrate and image the data, and calibrated images. Further data processing and 
analysis can be made off-line. 

Higher data rates from the on-line DSP than those supported by off-line pro- 
cessing can be obtained by integrating the calibration and imaging with the data 
acquisition process. Calibration and imaging can be optimized with the real-time 
feedback of the antenna calibration needed for beam formers and RFI suppression. 

A fast transient survey system at the VLA uses a compute cluster to search 
observations in real time for Fast Radio Bursts (FRBs) and other fast transients. 
This system, called “realfast”, triggers the recording of data at a high data rate 
during fast transients from sources such as FRBs, pulsars and flaring stars. 

The Murchison widefield array (MWA) is a low-frequency radio telescope to 
search for the epoch of reionization (EoR) and to probe the structure of the solar 
corona (http://mwatelescope.org/science). The MWA will have 128 antenna arrays 
capable of imaging the sky from 80 MHz to 300 MHz with an instantaneous field of 
view that is tens of degrees wide and has a resolution of a few arcminutes.'° The 
data rate of ~1 GB/s, with images every 8s, requires on-site, real-time processing 
and reduction in preference to archiving, transport and off-line processing. Real- 
time performance ~2.5 TFLOP/s is required. Reference 28 presents a heterogeneous 
computing pipeline implementation, using GPUs which are a good fit for pipeline 
processing, but lack flexibility or feedback into the data acquisition e.g. for RFI 
detection and excision. 

Phased array receivers have been installed on the Westerbork Synthesis Radio 
Telescope (WSRT) to increase the field of view of the individual antennas. The 
Apertif Radio Transient System (ARTS) is a hybrid FPGA-GPU machine for tran- 
sient surveys, pulsar-timing, and VLBI back end for the WSRT telescope. After 
FPGA beam forming, signals are processed using a 500-TFLOP GPU cluster. This 
versatile back end performs VLBI formatting, coherent de-dispersion for timing, 
and wide field fast-transient searches. Multiple independent VLBI beams over a 
field that is 10,000 times larger than currently possible with the Westerbork tele- 
scope can be sent to the EVN correlator for VLBI. The beam-forming is also 
essential for pulsar timing studies (see https://www.astron.nl/r-d-laboratory/arts/ 
arts-apertif-radio-transient-system-apertif ).?9 

The Event Horizon Telescope (EHT) uses phased arrays at CARMA, SMA 
and ALMA for VLBI observations at millimeter wavelengths. The signals from 
the antenna receivers, which are summed in the beam formers, must be kept in 
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phase. The compact VLBI target sources can be used as calibration sources in the 
correlator, and phases derived from real-time calibration used to update the beam 
former at intervals shorter than the atmospheric coherence time (1-10; see Fig. 1). 

The Molonglo Observatory Synthesis Telescope (MOST) shares the band 
820-850 MHz with mobile phones. New digital receivers, FPGA filterbanks and 
computers equipped with a GPU cluster have transformed MOST into a versatile 
new instrument (UTMOST) for studying the dynamic radio sky on millisecond 
timescales, ideal for work on pulsars and FRBs. The filterbanks, servers and their 
high-speed, low-latency network provide a hybrid solution to the signal processing 
requirements. Using software and commodity off-the-shelf hardware has enabled 
rapid deployment of innovative signal processing.®° 

The Collaboration for Astronomy Signal Processing and Electronics Research 
(CASPER) has developed open source hardware and software designs for re-usable 
DSPs to reduce the cost of designing, building and deploying new digital radio- 
astronomy processors for new or existing telescopes. These designs are currently 
used on over 45 scientific instruments worldwide. For more details, see Ref. 31. 
Using a multicast switch to distribute signals from the array receivers (see Fig. 1), 
allows back-end processors to be more easily added or upgraded without disrupting 
array operation, and leverages the investment in the array infrastructure to support 
new science capabilities. 

High performance digital signal processing enables us to handle high data rates 
from array telescopes and to make images in close to real time. Adaptive real-time 
data processing will revolutionize the science capabilities of existing and developing 
telescopes, and have a broad impact on the way that radio telescope arrays can be 
used. 
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Radio astronomy at wavelengths greater than ~ 1 m is increasingly accomplished 
using large arrays of low-gain antennas. Antennas are combined to form one or 
more steerable beams and/or correlated to form images. Arrays may also be 
organized into subarrays of various types (known variously as tiles or stations) 
for aperture synthesis imaging, where they play a role similar to that played 
by dishes at higher frequencies. This chapter presents a brief introduction to 
low frequency arrays, including science drivers, a summary of design principles, 
and brief descriptions of operational and emerging instruments. We point out 
characteristics of the science and instrumentation that distinguish low frequencies 
from efforts at higher frequencies. 


1. Introduction 


In this chapter, we review instrumentation presently in use for radio astronomy 
at wavelengths greater than about 1 m, corresponding to frequencies less than 
~ 300 MHz. This regime is often referred to as “low frequency”, and that is the 
intended definition here, although the reader should be aware that this qualifier is 
also sometimes applied to frequencies as high as 1 GHz and beyond. All modern 
general-purpose low frequency radio telescopes are arrays, and increasingly these 
arrays are beamforming systems comprised entirely of low-gain antennas. 
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1.1. What’s Past is Prologue 


It is useful to know something about the history of radio astronomy in order 
to put recent low frequency instrumentation trends in context. A great deal of 
radio astronomy from the 1940s through the 1960s was accomplished using low fre- 
quency dipole arrays making continuum observations. A few prominent instruments 
from this period include the 22 MHz dipole array at Penticton, British Columbia;! 
UTR-2, a 10-25 MHz array of fat dipole elements in Ukraine;? and the Clark Lake 
Teepee-Tee (TPT), a 15-125 MHz array consisting of conical spiral antennas in 
Southern California.? 

From the 1960s through the 1990s, low frequency astronomy was largely eclipsed 
by efforts at higher frequencies, predominantly at 1.4 GHz and above. The reasons 
for this transition were many: access to spectral lines; increased resolution asso- 
ciated with shorter wavelengths; avoidance of the deleterious effects of the iono- 
sphere, which worsen with decreasing frequency; and the efficacy of small dishes 
and the associated reduction in complexity relative to dipole arrays. Low frequency 
radio astronomy reemerged from obscurity beginning in the late 1990s, spurred on 
partially by new and evolving science drivers (summarized in Secs. 2 and 5), but 
equally by technical developments. Prominent among these developments were the 
327 MHz and 74 MHz systems on the VLA and the subsequent development of tech- 
niques for mitigation of ionospheric impairments in aperture synthesis imaging;*° 
the realization that simple low-gain antennas equipped with modern radio frequency 
front ends could achieve near-optimum sensitivity over large fractional bandwidth;® 
and improvements in digital signal processing and real-time computing facilitating 
beamforming with 100s to 1000s of antennas and 10s to 100s of MHz of bandwidth. 

About 15 years of re-invigorated interest has culminated in a new generation of 
low frequency arrays that in one sense picks up where the field left off in the 1970s, 
but in another sense is completely new and is perhaps a harbinger of technologies 
and observing techniques that may eventually be applied at higher frequencies. 
Prominent among these instruments are the first station of the Long Wavelength 
Array (LWA1), the LOw Frequency ARray (LOFAR), and the Murchison Widefield 
Array (MWA). 


1.2. Organization of This Chapter 


This chapter is organized as follows. Section 2 describes current science drivers 
for low frequency astronomy and the associated instrumentation requirements. 
Section 3 provides a brief introduction to the principles of low frequency instrument 
design. Section 4 presents a brief review of the most recent generation of operational 
low frequency arrays. In Secs. 5 and 6 we circle back to a science driver that has 
played a central role in the reemergence of low frequency radio astronomy, 21cm 
cosmology: Sec. 5 provides a brief review of the science, while Sec. 6 describes how 
current and planned low frequency arrays are being used to pursue this science. 
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2. Low Frequency Science 


The scientific applications of low frequency arrays may be broadly divided into four 
categories: (1) compact array beamforming; (2) widefield low-resolution imaging; 
(3) high-resolution aperture synthesis imaging; and (4) cosmology using the red- 
shifted 21cm line of neutral hydrogen. In this section, we briefly summarize the 
science applications of the first three categories, which encompass a broad range of 
science topics. We return to 21cm cosmology as a separate topic in Sec. 5, since 
this entails some unique instrumental requirements and has spurred the develop- 
ment of additional, purpose-built instruments. The primary purpose for this review 
is to identify instrumentation requirements that drive the design of low frequency 
arrays. More comprehensive discussions of low frequency science including historical 
perspectives are available elsewhere.’ ° 


2.1. Transients 


The thrill of the unknown is a big attraction at low frequencies, which have been 
relatively unexplored until recently. New instruments offer dramatic increases in 
sensitivity and field of view. 

A newly discovered class of transients, radio emission from fireballs (Fig. 1), 
provides a demonstration of the power of new instruments to look at the entire 
sky with unprecedented sensitivity and modest angular resolution.'° Even though 
fireballs appear in the sky daily, and produce strong emission (1-10kJy) lasting 


Fig. 1. Image of the sky after background subtraction, showing a fireball that covered 92° of the 
sky above LWA1.!° The edge of the circle marks a cutoff of 25° above the horizon. 
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from a few seconds up to a minute, they were not previously noticed. In the future, 
fireballs, and probably even fainter meteors, are likely to be a source of background 
noise for those interested in finding other types of transients. 

Prompt radio emission at low frequencies from gamma-ray bursts (GRBs) was 
predicted almost 40 years ago,'''% and could be an excellent diagnostic of the 
physics of the burst and the nature of the medium in which the burst took place. 
Despite numerous searches, prompt emission has not yet been detected.'*:!© The 
same goes for several other hypothetical cosmic explosions such as explosions from 
primordial black holes,'® or cusping of superconducting cosmic strings." Existing 
rate-density limits for these events are quite loose, so it is not yet clear if non- 
detections are because such events are non-existent or too rare, or fail to produce 
low frequency emission, or if previous searches have simply been inadequate. Radio 
afterglows from GRBs have been detected at frequencies above 1 GHz,'® and could 
in principle be detected at late times by low frequency instruments if sufficient 
sensitivities (sub-mJy) can be achieved. 


2.2. Jupiter and Extrasolar Planets 


Below 40 MHz Jupiter emits powerful bursts of coherent gyrocyclotron radia- 
tion,'?:?° reaching 10° Jy for the short duration S-bursts.?! This emission is corre- 
lated with auroral magnetic field lines as well as with field lines related to Io. The 
emission is thought to be beamed in the form of a hollow cone and exhibits high 
brightness temperatures (>10!° K), and nearly 100% circular polarization. 

Given the strength of the Jovian emission, it is natural to consider the possibility 
of detecting Jupiter-like planets in other solar systems. A Jupiter analog placed at 2 
pc, with magnetic fields a factor of 2 stronger (to raise the frequency of emission to 
the 50-80 MHz range where sensitivities are improved with existing instruments), 
should be detectable in 1 hour with the VLA combined with two LWA stations. 
LOFAR should have the sensitivity and resolution to detect such planets now. If 
the emission of the exoplanet scales with proximity to its parent star, then we might 
expect to be able to detect bursts from a planet orbiting at 1 AU to distances of 
10 pc. A detection would provide a measurement of the planetary magnetic field 
strength, which is a key component for life. Circular polarization is a powerful diag- 
nostic for exoplanet searches since most other sources in the sky exhibit relatively 
low levels of circular polarization. 


2.3. Pulsars and Rotating Radio Transient 


Pulsars were discovered at 81 MHz?? and are intrinsically steep spectrum, though 
many may turn over in the 100-400 MHz range. Effects caused by the interstellar 
medium (ISM), such as dispersion and scattering, are particularly strong at fre- 
quencies below 300 MHz.??:?4 The dispersion measure of PSR J2145-0750 has been 
—3 25 


measured to an accuracy of 107° pecm which is comparable to fluctuations in 
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dispersion measure detected in pulsar timing array experiments.?° This is of par- 
ticular interest because this DM fluctuation can mask the signal of a prospective 
gravitational wave. Low frequencies also provide a unique window on anomalous 
pulsar behavior such as “giant pulse” emission,?” which in turn informs studies of 
the ISM and the efficacy of searches for dispersed transients from other sources. 

Rotating radio transients (RRATs) are a recently discovered class of astrophys- 
ical sources that are thought to be related to neutron stars, possibly old pulsars 
that are sputtering out.?® Another possible explanation is that the RRAT transient 
behavior is a geometric effect resulting from the reversal of the direction in which 
the radio beams are emitted and is thus similar to nulling pulsars.?? The spectral 
properties of RRATs are still unknown; however, there is some indication that they 
may have steeper spectral indices than is typical for pulsars, making low frequency 
studies highly valuable for studying this population. 


2.4. Galaxies and Galaxy Clusters 


Low frequency observations provide useful diagnostics of cosmic rays in our own 
galaxy®° and in nearby galaxies such as M51 (see Fig. 2). Cosmic rays are an 


Declination (J2000) 


13530™30° 30™00° 29™30* 29™00* 
Right Ascension (J2000) 


Fig. 2. M51 radio continuum image at 151 MHz from LOFAR superposed on an optical Digital 
Sky Survey image. Contours start at 1 mJy/beam and increase by a factor of 1.5.°! 
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important source of ionization and heating within galaxies and also impact Earth’s 
atmosphere. In M51 it is found that the disk extends to a radius of 16 kpc and 
that the smooth distribution can be well explained by cosmic ray electron diffusion 
from the spiral arms to the inter-arm regions.?! Active galaxies have long been 
studied at centimeter wavelengths and are well known to be strong emitters at low 
frequencies. As it is long-lived, low frequency synchrotron emission can be used 
to trace the history of feedback in AGN and is often found coincident with X-ray 
cavities.°? Clusters of galaxies may also be a source of low frequency emission, either 
from energetic particles leaking from the central AGN, or from thermal particles 
energized by shocks within the cluster. Besides the central AGN, there are mini- 
halos (size ~500 kpc) found in the centers of galaxies with strong cooling cores, as 
well as larger halos (size > 1 Mpc), more often found in disturbed clusters with a 


history of mergers.°° 


2.5. Solar Physics and Space Weather 


At low frequencies, the quiet Sun is fainter than several other bright sources such 
as Cygnus A, Cassiopeia A and Taurus A, but the active Sun can reach flux levels 
in the MJy regime (making observations of anything else during these events chal- 
lenging!). New instruments have provided an unprecedented view of the active Sun 
and revealed complex new features in solar storms, as demonstrated in Fig. 3. This 
dynamic spectrum reveals dramatic sub-structure in time and frequency. 

Besides noise storms (Type I bursts), the Sun exhibits slow-drift bursts 
(Type II), fast-drift bursts (Type III), and broadband bursts (Type IV). Type III 
bursts are driven by beams of non-thermal keV-energy electrons propagating out 
from the Sun on open field lines, producing radio emission at the local plasma 
frequency by plasma emission, i.e. conversion of electrostatic Langmuir waves 
into electromagnetic waves. Since the electrons are travelling rapidly outwards, 
the corresponding radio bursts drift rapidly downwards in frequency and appear 
as nearly vertical features in dynamic spectra. The currently favored “stochastic 
growth” theory for plasma emission predicts a log-normal distribution for wave- 
field strengths in these bursts®® that can be tested with low frequency data. By 
contrast, Type II bursts are associated with shocks propagating through the solar 
corona. They are often correlated with coronal mass ejections, which can generate 
shocks as they move through the solar atmosphere at high speed. However, there 
are aspects of Type II bursts that appear to be associated with the structure of the 
flaring region itself. Type II bursts show a lot of substructure in frequency-time plots 
that may be associated with inhomogeneity in the shock. The exact nature of the 
source of plasma emission in Type II bursts is still not entirely understood. Moving 
Type IV bursts are associated with large solar eruptions and can move several 
degrees at thousands of km/s; in extreme cases high signal-to-noise observations 
should permit us to measure such motions and re-invigorate a field of study that 
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Fig. 3. A type II solar burst captured by LWA1, exhibiting fine structure in time and frequency.34 


has been largely dormant for 20 years, due to the lack of imaging at low frequencies 
where these intriguing (and potentially space-weather relevant) phenomena occur. 


2.6. Ionospheric Physics 


The ionosphere reflects and distorts radio waves passing through it, with effects 
becoming stronger as wavelength-squared until the plasma frequency is reached 
and it becomes opaque, generally around 10 MHz but with a strong dependence on 
time-of-day, season, and solar cycle. Proper calibration of radio interferometry data 
therefore requires understanding and monitoring of the ionosphere. At the same 
time, there is great interest in understanding our ionosphere, motivated in part by 
the way that the ionosphere affects communications. 

Of the many techniques for studying the ionosphere, low frequency interfer- 
ometry offers great precision in measuring total electron content (TEC) values. 
Furthermore, the structure of the bottom of the ionosphere can be studied by 
bouncing radio signals from a transmitter off the ionosphere and then receiving 
them with an interferometer. Frequencies below 10 MHz are preferred for this work 
in order for the ionosphere to be more fully reflective. Studies of the bottom side of 
ionospheric structures can reveal waves formed by interactions with the atmosphere, 
or by explosions. 
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2.7. Lightning 


Low frequency interferometry has been used to explore the physics of lightning 
since the late 1970s.3° The advent of digital signal processing and inexpensive 
time-keeping via GPS has spurred renewed interest in this area, which includes 
investigation of related phenomena such as terrestrial gamma ray flashes (TGFs), 
sprites, and jets.°” 89 Major questions that remain unanswered include: how does 
lightning initiate, and develop in time; why negative breakdowns are seen before 
positive breakdowns; and the spectrum and flux density distribution of lightning. 
Knowledge of major thunderstorms as traced by lightning is also important for 
minimizing property damage. 


3. Principles of Low Frequency Array Design 


In this section, we outline the principles of low frequency array design. Modern 
low frequency arrays consist of large numbers of low-gain “elements”, which are 
typically dipoles or dipole-type antennas. The elements are digitized, in some cases 
individually, and in other cases with some analog combining in advance. However, 
in all modern arrays the first stage or stages of amplification occur at or near the 
element terminals, which has led to the use of the term active antenna to describe 
the combined unit. Detailed documentation of some active antennas can be found 
in Refs. 40 and 41. 

Receivers in modern instruments are uniformly of the direct sampling variety, 
meaning signal processing consists only of gain and filtering, with no frequency con- 
version and no analog-domain decomposition into baseband in-phase and quadra- 
ture components. Thus digitizers sample at rates comparable to the sky frequencies, 
typically using first Nyquist zone conversion for frequencies below the 88-108 MHz 
FM broadcast band and second Nyquist zone “undersampling” for higher frequen- 
cies. Strong interference makes the FM broadcast band unusable for radio astronomy 
in all but the most remote locations. 


3.1. Noise Considerations 


As in any other frequency regime, the sensitivity of a low frequency radio telescope 
is determined primarily by collecting area and noise. However, the extraordinarily 
high sky brightness temperature at low frequencies leads to somewhat. different 
considerations in design criteria and subsequent characterization of instruments. 
A model for characterizing the sensitivity-limiting noise delivered to the dig- 
itizer in a low frequency array is shown in Fig. 4. For simplicity this particular 
model assumes that each element is individually digitized, but the principles are not 
significantly different when analog combining is employed. Noise captured by the 
antenna is conveniently characterized in terms of antenna temperature T'4. For the 
purposes of this chapter T4 may be defined as the power spectral density (PSD; e.g. 
W Hz") delivered to a load that is conjugate-matched to the antenna impedance, 
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Fig. 4. Model for analysis of noise in low frequency arrays. 


Za = Rat+jXa, divided by Boltzmann’s constant, k = 1.38 x 10778 J/K. As will be 
explained later, antennas used in low frequency arrays are often poorly-matched to 
receivers. The effect of impedance mismatch may be taken into account by defining 
a two-port circuit, identified as an “antenna interface” (AT) in Fig. 4, which connects 
the antenna to the rest of the receiver. The AI is assumed to be perfectly-matched to 
the receiver over the bandwidth of interest, and exhibits input impedance Zp which 
may or may not vary significantly with frequency. In implementation terms, the AI 
may be any combination of impedance transforming devices, baluns, and associ- 
ated losses, or simply a “null” two-port representing direct connection between 
the antenna and receiver. In any event the AI may be characterized in terms of 
transducer power gain, Gr, which in the present problem may be defined as the 
ratio of the PSD S‘4, delivered to the receiver, to the PSD kT, “available” from 
the antenna. Neglecting any noise that might originate from the AI, 


Sar = kT4Gr, (1) 


where 


4RaARR 


= ShaEn (2) 
[Za wel 


T 
as may be verified using elementary circuit analysis. The PSD subsequently delivered 
to the digitizer is 


Sr=kTsaGrGr+kTrGr, (3) 


where Gp is the gain of the receiver and the second term accounts for the noise 
contribution from the receiver expressed as the input-referred equivalent noise tem- 
perature Tr. 

The antenna temperature T'4 is comprised of naturally-occurring contributions 
Tsky from the sky and Tyna from the ground, plus noise associated with interfer- 
ence. Interfering noise originates from human activity and from intermittent natural 
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events including lightning and solar bursts. Man-made noise below 300 MHz is 
nominally negligible compared to naturally-occurring noise in the rural locations 
where radio telescopes are typically built, and naturally-occurring interference is 
intermittent and thus negligible most of the time. The resulting antenna tempera- 
ture can be expressed as 


Ta = (Tsky + Tend); (4) 


where 77 is the ground loss efficiency. This efficiency accounts for near-field coupling 
between the antenna and the ground that results in loss in the ground manifesting 
as loss in the antenna itself. Typical values of 7 range from about 0.5 (typical at 
38 MHz above untreated earth without a conducting ground screen*?) 
to 1 (typical above 100 MHz, or with a sufficiently large ground screen). 

Tsky in this frequency regime is dominated by the very bright Galactic syn- 
chrotron background. The brightness temperature of this background is somewhat 
higher in the Galactic plane and somewhat lower in the Galactic polar regions; 
however, the total contribution to the antenna temperature of a low-gain antenna 
isolated from ground loss is found to be 


to very close 


(5) 


7 —2.55 
Tory © (9120 K) ( ) 


39 MHz 


where v is frequency. (This expression is obtained by a fit to the model described 
in Appendix I of Ref. 6.) This is shown in Fig. 5. Note that a diurnal variation of 
about +20% is present in this contribution, with the maximum corresponding to 
the transit of the Galactic center and plane. 

Tena is typically a small fraction of Ty,y since the antenna pattern has low 
response below the horizon, so any sky power reflected by a ground screen makes a 
relatively small contribution, and also because any ground not covered by a ground 
screen exhibits brightness of “only” a few hundred K. In practice, typical values for 
Tena are on the order of a few hundred Kk. 
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Fig. 5. Contribution of the Galactic synchrotron background to the antenna temperature of a 
dipole-type antenna isolated from ground loss. The center curve is the daily mean value whereas 
the upper and lower curves represent the typical limits due to diurnal variation. 
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Combining results, the PSD delivered to the digitizer is 
Sr= NkT skyGrGrR + nkTonaGrGrR +kTrpGp. (6) 


To convey a sense of magnitudes, the left panel of Fig. 6 shows these contributions 
individually and together for a simple dipole 2m long by 2cm in diameter, reso- 
nant at ~ 70 MHz. In this result, 74 is obtained using a well-known equivalent 
circuit model,4” Ze = 100 Q, Tyxy is from Eq. (5), Tyna = 150 K, 7 = 1, and 
Tr = 300 K. The contributions are shown in the form of input-referred equivalent 
noise temperature, which is essentially Eq. (6) divided by knGrG'r, so that the sum 
may be interpreted as a system temperature Tyys. 

It is apparent that the noise delivered to the digitizer is dominated by Tixy 
over a large fraction of the frequency range considered. This is desirable since the 
Galactic noise contribution is irreducible and so any further improvements in other 
aspects of the design cannot significantly improve sensitivity. From this perspective 
the following “stopping criterion” for design may be obtained from Eq. (6): 


TR 


G <1 (Tsky + Tena) : (7) 
oT 


In other words, peak sensitivity is essentially optimized by making Tp sufficiently 
small relative to Gr (i.e. small relative to the effect of impedance mismatch), and 
further optimization of Tr or Gr then offers little additional benefit. 

While peak sensitivity is limited by Tr/Gr, it is apparent from Fig. 6 that 
the bandwidth over which this “optimal” sensitivity is achieved may be improved 
by either further reducing Tp, or by expanding the bandwidth associated with G'r. 
Further reduction in Tp is normally not attractive, since this typically entails a 
corresponding reduction in linearity.4 Alternatively, the antenna may be “broad- 
banded” such that G'r varies less over the bandwidth of interest, or — less obvious 
as a solution — Zp might be increased. The broadbanding strategy leads to “fat 
dipoles”, prime examples being the antenna elements used by LWA1, MWA, and the 
LOFAR high-band array. In the second strategy, Zp is increased from ¥ 100 Q by a 
factor up to 4 or so by impedance transformation within the AI, resulting in lower 
peak sensitivity but minimum sensitivity achieved over a larger bandwidth.® This 
is demonstrated in the right panel of Fig. 6, where it is apparent that increasing 
Zr decreases peak sensitivity but increases the bandwidth over which a threshold 
level of sensitivity can be achieved. Note that the vertical scale of this plot could 
be interpreted as a measure of the integration time required to achieve a specified 
signal-to-noise ratio. 


3.2. Beamforming 


In modern parlance, a set of co-located elements which are processed using a com- 
mon central facility is referred to as a station. Station-level beamforming consists 
of combining signals associated with the elements such that radio signals arriving 
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Fig. 6. Left: Contributions to the apparent system temperature for a single dipole in a 
low-frequency array (Zz = 100). Right: Ty relative to instrumental noise for various receiver 
input impedances Zp. See text for scenario details. 


from the same direction add in phase. Generally, the required phase shift varies by 
too much over the bandwidth of interest for this to be implemented as a frequency- 
independent phase shift. Solutions include delay-and-sum beamforming (e.g. used 
in LWA1) and frequency channelization followed by phase-and-sum beamforming 
per channel (e.g. used in LOFAR). The number of elements that must be combined 
by beamforming can be reduced by hierarchical beamforming, in which beams are 
formed first for subarrays (subsets of the array) and then the subarray beams are 
combined in a separate step. MWA and the HBA component of LOFAR both employ 
hierarchical beamforming in which the first stage is analog beamforming on subar- 
rays of 16 dipoles, referred to as a tile. 


3.3. Collecting Area, Directivity and Array Geometry 


The concept of collecting area often becomes ambiguous for low frequency arrays 
because antennas may be severely mismatched (so, Gr < 1) in useful modes of 
operation, as demonstrated in Sec. 3.1. However, the concept of directivity is quite 
applicable. LWA1, LOFAR and MWA use elements achieving zenith directivity 


*For the uninitiated, directivity is most easily understood by imagining that the array is trans- 
mitting as opposed to receiving; in this case, directivity is the ratio of power density (W m~?) in 
the intended direction to the power density averaged over all 47 sr. 
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in the range 3-9 dB, and manage to aggregate somewhere between 48 and 2048 
dipoles per polarization per station, limited by a combination of mechanical and 
cost factors. The resulting directivity for zenith-pointing beams is between 20 dB 
and 40 dB. If the elements are arranged with spacing sufficiently close to avoid 
spatial aliasing, then the associated beamwidths (on a half-power basis) range from 
about 18° to about 2°, decreasing with increasing frequency. 

The positions of elements within a station determines the shape of the main 
lobe and the levels of sidelobes. Classical array theory mandates element spacings no 
greater than 0.5A in order to mitigate spatial aliasing (additional spurious “grating 
lobes”) from horizon to horizon. This spacing may be relaxed if the pointing range 
relative to the zenith is suitably restricted. Also, randomizing spacings tends to 
inhibit grating lobes, at the cost of increasing sidelobe levels. Within these limits 
it is generally useful to maximize spacings, for two reasons. First, larger spacings 
increase the overall aperture of the station, which decreases beamwidth. The second 
issue pertains to the possibility of external noise dominance, which makes the noise 
measured at the output of antenna-receiver channels significantly correlated.44 This 
correlation significantly desensitizes the beams formed by the array, and can be 
mitigated only by increasing the spacing between the antennas.” Further, the sen- 
sitivity of the array does not increase linearly with the number of antennas under 
these conditions. 

A related concern is electromagnetic coupling between array elements. The 
effect of coupling is both significant and notoriously difficult to quantify. The prin- 
cipal effect is to perturb the patterns of individual elements from their nominal 
“standalone” patterns. The nature of the perturbation depends on spacing and the 
geometry of the array. The overall effect may be either positive or negative, and is 
difficult to summarize in a concise way. For methods of analysis and an example of 
how spatial correlation and mutual coupling affects modern low frequency arrays, 
see Ref. 44. 

Finally, we circle back to the issue of collecting area and sensitivity. It is common 
to characterize the sensitivity of large dishes and higher frequency arrays using the 
ratio Ac/Tsys where A. is collecting area (formally, effective aperture) and Tyys is 
system temperature. Calculation of A./T;y; from A, and T;y, is somewhat awkward 
for low frequency arrays because, as noted above, A, may be ambiguous due to 
(1) large “normal” impedance mismatch; (2) spatial correlation of external noise, 
which means sensitivity does not increase linearly with number of elements; and (3) 
mutual coupling, which makes it difficult to know A, for single elements, much less 
for the entire array. However, the ratio A./T;,; can be computed from the system 
equivalent flux density (SEFD). SEFD is the flux density (ic. W m~? Hz~+) of 


bit is ironic that, while this correlation is certainly bad for beamforming, these are precisely the 
data of interest if the array itself is to be used for imaging (see science topics described in Secs. 2.1 
2.6 and 2.7). If station-level imaging is a priority, it is best not to increase antenna spacings too 
much. 
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an unresolved source that results in a doubling of the measured power. SEFD has 
the advantage that it may be calculated directly (but not necessarily easily) from 
astronomical measurements. 


4. Modern Low Frequency Arrays 


In this section, we review some instruments that represent the latest generation 
of low frequency arrays. Before doing so, it is worth noting that several low fre- 
quency arrays developed prior to the current generation continue to operate. These 
instruments include the previously-mentioned UTR-2 as well as GEETEE, the 
35-70 MHz array located in Gauribidanur, India, and the Nangay Decametric Array 
(10-100 MHz) located in France, among others. 


4.1. Deuterium Array 


The Deuterium Array was an array of low-gain elements that successfully observed 
the extraordinarily weak and previously undetected 327 MHz spectral line of 
deuterium.*” The array consisted of 24 “stations”, each consisting of 24 dual- 
polarization two-element (main+director) Yagi antennas positioned according to 
a 0.8-wavelength rectilinear grid above a wire mesh ground plane. Each polariza- 
tion of every antenna was separately digitized and beams were formed entirely in 
software. Although actually a short-lived (2004-2006) experiment, the instrument 
was significant as a demonstration of the efficacy of digital beamforming for high- 
sensitivity radio astronomy, and no comparable instrument currently exists with 
the ability to observe in this frequency range. 


4.2. Long Wavelength Array Station 1 


Long Wavelength Array Station 1 (LWA1) is a large modern operational general- 
purpose beamforming array operating at 10-88 MHz.4°47 The instrument, shown 
in Fig. 7, is collocated with the VLA in central New Mexico. LWA1 consists of 


Fig. 7. LWA1. Image credit: LWA. 
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about 256 dual-polarized dipole-type antennas. The output from each antenna is 
individually digitized, and beams are formed using a digital-domain delay-and-sum 
technique. The dipoles are distributed within a 100 m x 110 m elliptical aperture, 
resulting in a beamwidth of roughly 2° for zenith pointing at the highest frequen- 
cies, increasing to tens of degrees at the lowest frequencies and for pointing at low 
elevations. The major axis of the aperture is aligned in the north-south direction in 
order to improve the symmetry of the beam for Galactic center observations, which 
require pointing to low elevations toward the south. The distribution of antennas is 
pseudo-random, with 5 m minimum spacing to facilitate maintenance. The pseudo- 
random distribution ameliorates grating lobes, which would otherwise become a 
problem for frequencies greater than about 30 MHz. 

The LWA1 antenna element is a wire-grid bowtie-type dipole, having dipole 
arms that are angled downward into an inverted “V” configuration to improve 
uniformity of gain over the sky.4! The broad impedance bandwidth combined with 
receiver temperature ~ 300 K results in a system temperature dominated by the 
Galactic synchrotron background by a factor of 4:1 from 24-87 MHz. For compar- 
ison, this can be compared to the thin dipole results shown in Fig. 6 (right; for 
LWA1, Ze = 100 2). 

LWA1 achieves SEFD on the order of 10 kJy at the zenith. This sensitivity is 
only weakly dependent on frequency, because the collecting area of the elements 
increases in proportion to \? where \ is wavelength, whereas T;,y increases as \?-°° 
(per Eq. (5)), and sensitivity is proportional to the ratio. 

As noted in some of the examples presented in Sec. 2, LWA1 is also able to 
operate as an interferometer. The field of view is equal to the element pattern — 
i.e. essentially the entire sky — whereas the spatial resolution is not significantly 
better than the beamwidth, since both modes are limited by the physical aperture. 
Recently a second LWA station has been constructed at the Owens Valley Radio 
Observatory in California (USA) which has been designed primarily for interferom- 
etry. The Owens Valley LWA station array is essentially identical to that of LWA1, 
except the array geometry has been scaled up by a factor of about 2.3, with a 
corresponding improvement in resolution while retaining essentially full-sky FOV. 


4.3. LOw Frequency ARray 


LOw Frequency ARray (LOFAR) is an operational large low frequency interferom- 
eter consisting of tens of stations (48 planned) distributed over Northern Europe.*® 
Figure 8 shows several of these stations. Each station consists of a 10-90 MHz 
“low-band array” (LBA) component and 110-240 MHz “high-band array” (HBA) 
component. These components use different antenna elements and array geome- 
tries. LBA arrays consist of 48 or 96 dual-polarized thin-dipole inverted V-shaped 
antennas over ground screens arranged in a pseudo-random geometry within a cir- 
cular aperture about 80 m in diameter. HBA arrays consist of 24, 48 or 96 “tiles” 
(subarrays), with each tile consisting of a 4 4 array of dual-polarized “fat” wire-grid 
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Fig. 8. An aerial view of the LOFAR “superterp”, a region within the 2-km-diameter “core” 
region, within which 24 of the stations are located. Nine stations are visible here: 6 within the 
circular island in the center of the image, one at the lower left, and two more on the upper right. 
The square dark-colored panels are HBA tiles, organized here into 24-tile subarrays. The smaller 
dark-colored patches are the ground screens of individual LBA antennas. Credit: van Haarlem et 
al. Astron. Astrophys., 556, A2(2013), reproduced with permission © ESO. 


dipoles in a compact rectilinear geometry, with the tiles themselves being arranged 
in a compact geometry. The HBA tiles use analog beamforming; the LBA dipole 
outputs and HBA tile outputs are digitized and all subsequent processing is digital. 
The stations are distributed in a pseudo-random manner with baselines ranging 
from 68 m to 1158 km, facilitating imaging with resolution ranging from 0.5° to 
sub-arcsecond scales, depending on frequency and ionospheric conditions. Although 
intended primarily as an aperture synthesis imaging instrument, LOFAR may also 
be operated as a beamforming array for observations of unresolvable objects such 
as pulsars. 


4.4. Murchison Widefield Array 


Murchison Widefield Array (MWA) is a 80-300 MHz array located in western 
Australia, intended primarily for angular power spectrum detection of the Epoch 
of Reionization (more about this in Sec. 5), but well-suited to a variety of other 
science in this frequency band.*? A portion of the array is shown in Fig. 9. MWA 
comprises 128 tiles, each consisting of a 4 x 4 rectilinear array of dual-polarization 
elements, similar to the LOFAR HBA tiles. The tiles are distributed in a pseudo- 
random fashion with baselines up to 3 km. This results in a relatively large FOV 
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Fig. 9. A portion of the MWA. Each square patch is a 4 x 4 tile of “fat” wire-grid dipoles. 
Thirty-three of 128 tiles are visible in this view. Image credit: The MWA Project. 


(due to the small size of the beamforming tiles) simultaneously with useful angular 
resolution. 


5. 21cm Cosmology 


An important science driver for the development of low frequency arrays has been 
cosmology using the redshifted 21cm line of neutral hydrogen (HI). These observa- 
tions provide a unique avenue for constraining the radiative properties of the first 
luminous objects.°° An advantage of this approach is that the HI is optically thin, 
allowing the gas distribution to be measured throughout the universe. This tech- 
nique complements others such as the kinetic Sunyaev—Zel’dovich effect from cosmic 
microwave background maps®! and quasar absorption lines.°? Cosmological epochs 
potentially accessible via low frequency observations include the so-called Dark 
Ages, corresponding to redshift z greater than about 40 (~ 35 MHz); the Cosmic 
Dawn, corresponding to redshifts z between ~ 40 and ~ 17 (roughly 35-80 MHz) 
and the Epoch of Reionization (EoR), beginning around z ~ 11 (~ 120 MHz) and 
believed to be completed by z ~ 6 (~ 200 MHz). Detecting the 21cm signal from 
the Dark Ages is currently considered to be beyond the capabilities of ground-based 
instruments (thus, low frequency arrays) due in particular to ionospheric distortion 
and radio frequency interference. We provide a brief review of Cosmic Dawn and 
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EoR. science below and summarize the use of low frequency arrays in the next 
section. 

Cosmic Dawn encompasses the formation of the first galaxies and black holes. 
These luminous sources produce ultraviolet and X-ray radiation that radically alters 
the properties of the diffuse neutral hydrogen gas that fills the majority of space at 
these times. Coupling with the gas temperature results in the HI line being seen in 
absorption at high redshifts at the beginning of this epoch, when the gas is cooler 
than the CMB due to adiabatic expansion. The HI line is eventually seen in emission 
at lower redshifts (z ~ 8), after the gas becomes heated by X-rays from the first 
stars and black holes. According to theoretical models, the turnover between these 
regimes occurs at z ~ 17 (~ 80 MHz),°*:>+ and the maximum absorption occurs 
between 50 MHz and 100 MHz.°° With a sufficiently well-calibrated instrument it 
may be possible to detect the “global” spectral signature of the absorption. This is 
challenging since one is looking for a 100 mK (possibly less) signal in the presence 
of a ~3000 K sky. Alternatively one can look for the characteristic angular power 
spectrum of the emission. This offers additional insights about the distribution of 
the first stars and active galaxies, but requires greater sensitivity and some way to 
filter out Galactic and intergalactic foregrounds. 

During the EoR, emission from the first stars and galaxies fully ionizes the neu- 
tral hydrogen gas that pervades the universe. Detection and subsequent observations 
of redshifted hydrogen from this epoch are expected to yield valuable insights into 
the energetics of the early universe and the processes of initial galaxy formation. A 
detection would place strong constraints on the spin temperature of the hydrogen 
gas, the neutral fraction, and the timeline of the reionization.°©°” Besides probing 
the EoR, the tomography provided by imaging the HI line will tell us about how 
massive galaxies formed in the early universe and will provide strong tests of \CDM 
cosmologies. 

As is the case for the Cosmic Dawn, there are two principal observation strate- 
gies for the EoR: (1) search for a sky-averaged (“global”) spectral signature of 
reionization; or (2) search for the angular power spectrum signature of reionization 
produced by variations in the neutral density and temperature on the sky. Detection 
of global spectral signature of the EoR can, in principle, be achieved using a single 
dipole; whereas a power spectrum detection requires an array. Both strategies are 
currently considered to be viable for detection, although the spatial power spec- 
trum approach will eventually be required for a complete exploration of this epoch. 
Detection remains a formidable challenge due to sensitivity, RFI, and ionospheric 
effects. Due to the extreme weakness of the EoR signal, a detection requires months 
to years of observation using the instruments described in the next section. 


6. 21cm Cosmology Experiments Using Low Frequency Arrays 


The Large aperture Experiment to detect the Dark Age (LEDA) project is attempting 
a spectral detection of the absorption trough associated with Cosmic Dawn using the 
LWA station at Owens Valley.°* Here the array is being used to precisely map the 
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bright and diffuse sky foreground so that the associated bias may be subsequently 
removed from the spectrum measured by “outrigger” dipoles that are isolated from 
the array so as to mitigate mutual coupling. 

Detection of the spatial power spectrum signature of redshifted hydrogen from 
the EoR was a principal science driver in the design of LOFAR and MWA, and EoR. 
detection projects are underway using these instruments. 

A number of low frequency arrays purpose-built for the detection of the EoR 
have also emerged. Prominent among these is the Precision Array to Probe the Epoch 
of Reionization (PAPER).°? PAPER elements are dual-polarized sleeve dipoles over 
shaped ground planes. This instrument is transportable, reconfigurable (various 
array geometries have been used) and scalable; with deployments from 4 elements 
to 128 elements so far. Current limits from PAPER are beginning to rule out very 
low spin temperatures, and to constrain the neutral fraction of the hydrogen. 
Already the scenario where there is no heating of the gas by stars and galaxies 
has been ruled out. However, the limits are still about a factor of 10 above the 
predictions for “standard” EoR models. 

A much larger interferometer will be required to productively observe the 
EoR signal. One such instrument now in development is the hydrogen EoR array 
(HERA), which will consist of hundreds of 14-m dishes, all fixed to point to zenith. 

Figuring prominently in the future of EoR science is the square kilometer array 
(SKA).° Current plans call for the construction of “SKA Phase 1 Low,” a large 
meter-wavelength array, in Western Australia. 
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In this chapter, I discuss radio telescope feed systems, which typically consist 
of a feed horn that couples the radio optics to the backend electronics. Horn 
designs suitable for different radio frequency regimes in the cm and mm-wave 
ranges are discussed, emphasizing the importance of edge tapers in maximizing 
horn efficiency. Examples of modern feed horns from several telescopes are pre- 
sented. Focal plane cameras, including multi-pixel and phased array designs, are 
discussed. I conclude with consideration of RF receivers and digital beam formers. 


1. Centimeter and Millimeter Wavelength Feeds 


In radio astronomical applications at centimeter and millimeter waves, a feed sys- 
tem illuminates a suitable reflector optic that provides a large collecting area at 
the frequency of interest. Since losses must be kept at minimum for good sensitiv- 
ity, these systems tend to have few components: typically a feed-horn, an ortho- 
mode transducer (OMT) to separate polarizations, a low noise amplifier (LNA), a 
frequency down-conversion mixer, and an analog-to-digital (A/D) conversion and 
post-detection stage. If LNA technology is not readily available at the frequency of 
interest, the OMT output may be fed directly into the frequency mixer. A typical 
schematic of these subsystems are shown in Fig. 1. 

A noise injection coupler system is also included for noise temperature calibra- 
tion, as well as characterization of the LNA paths. The noise coupling is typically 
—30 dB. All these components, in particular the LNAs, are cryogenically cooled in 
most centimeter wave receivers and certainly in all millimeter wave systems in order 
to achieve very low noise system temperatures, as indicated by the dashed line in 
the figure. Depending on size, it may be possible to cool only part of the feed (this 
is usually the case for L-band and S-band and C-band receivers). 
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Fig. 1. Front-end receiver: feed system components. 


In the following sections, we will review briefly each component in order to gain 
a better understanding of the trade-off parameters involved for optimum system 
performance. 


1.1. Telescope Illumination and Feeds 


The feed illuminates the radio telescope optics, providing control over the effective 
aperture and cross-polarization characteristics of the radio telescope. At centimeter 
and millimeter wavelengths corrugated feed horns are most commonly used, as they 
provide narrower beams compared with dipoles or dipole arrays as the feed element, 
as well as also providing wider operational bandwidths (in excess of 2:1 for specially 
designed wideband corrugated horns) and minimum losses. At higher frequencies, 
manufacturing limitations impose the use of other alternatives, such as smooth- 
walled feed horns, Potter—Picket horns, diagonal horns, etc. 

One initial important parameter needed for the feed design is the illumination 
angle 0p of the telescope optics. Let us assume, for clarity, that the radio telescope 
optics could be simply reduced to a dual reflector system consisting of a main 
reflector and small sub-reflector. The feed directly illuminates the sub-reflector, 
whose edge subtends the angle 0g, as seen from the final focal point of the optics, 
fo, Shown in Fig. 2. The phase center of the feed at the frequency of interest should 
also be located at the focal point in the optical system. 

Narrow beam patterns of corrugated feed horns could be considered Gaussian 
to a very good approximation, with a far-field power pattern of the form of 


P;(0) = Py, e720 8/88), (1) 


where @ is the subtended angle from the optical axis of the horn, and the edge taper, 
a, is defined in dB by 


QdB = (2) 
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Fig. 2. Typical radio telescope optics used at centimeter and millimeter wavelengths. 


Taper-A Taper-B 


100 + 
90 


80 E Antenna Eff 
70 -——— ~rlllumination Eff 
—-Spillover Eff 


Efficiency [%] 
oa 
oO 


Edge Taper 


zal 


Edge Taper [dB] 


Fig. 3. Illumination efficiency, spillover efficiency and realized antenna efficiency as a function of 
the illumination edge taper value. 


The edge taper controls the aperture illumination, the spillover power and overall 
antenna efficiency. Edge taper of zero produces a uniform aperture illumination, 
but not necessarily the best overall antenna efficiency, as most of the power will be 
wasted past the sub-reflector edge. The antenna efficiency is a function of the prod- 
uct of the illumination and spillover efficiencies, as shown in Fig. 3. By increasing 
the edge taper, the spillover efficiency will increase as more power will reach the 
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Fig. 4. Aperture illumination distribution for illumination Edge Taper A (left) and B (right). 
The lighter color line shows the actual geometrical edge of the aperture. 


Power [dB] 
| O 


Fig. 5. Two-dimensional maps of the far-field co-polar radiation patterns for illumination cases 
Edge Taper A (left) and B (right). 


aperture of the telescope. For the example shown in Fig. 3, the optimum edge taper 
value is close to —11 dB. A stronger Edge Taper A (~ —19 dB) produces better 
spillover efficiency but reduced illumination efficiency, while a weaker Edge Taper B 
(i.e. ~ —6 dB) has much better illumination efficiency but worse spillover efficiency 
(see Fig. 4). 

Another consequence of edge taper variations is the effect on the telescope beam 
size, as it is controlled by the actual illuminated area in the aperture. Figure 5 shows 
the calculated two-dimensional (2D) co-polar far-field radiation pattern for edge 
taper cases A and B. Edge Taper A, in the left panel, has an increased beam size 
compared with the nominal case with —11 dB edge taper, but also has a reduced 
first sidelobe level (SLL) of about —27 dB. In the right panel, Edge Taper B has 
effectively reduced the beam size but has increased the first SLL to —19 dB. 
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Fig. 6. Variations in beam size and beam efficiency (at the —10 dB level) as a function of the 
illumination edge taper value. 


Figure 6 shows the variation (in percent) of the half-power beam width (HPBW) 
with respect to edge taper illumination. High taper values reduce the effective aper- 
ture, increasing the beam size on the sky, whereas small values of edge taper increase 
the illuminated aperture, reducing the beam size (Fig. 4). The minimum beam size 
is attained with uniform illumination. On the other hand, the beam efficiency (at 
the —10 dB level) could vary between 80% and 90% as a function of edge taper; 
as shown also in the right vertical axis in Fig. 6, the highest beam efficiencies are 
obtained for edge taper values between —19 to —10 dB. 


1.2. Feed horns 


One of the first front-end components of modern cm- and mm-wave radio astro- 
nomical receiver systems is the feed horn. Unlike direct detection systems such as 
bolometers, feed horns in cm-wave and mm-wave radio astronomical receivers are 
single mode waveguide systems, although they could be single or dual polarized 
(single mode each). The corrugated feed horn is considered the most prevalent feed 
type for radio astronomical applications; the inside walls of the horn are manufac- 
tured with corrugations that support the propagation of hybrid modes within the 
horn, which provide the excellent beam symmetry, low sidelobes, and very low cross- 
polarization over large frequency bandwidths up to 2:1, that we mentioned earlier. 

There are many publications related to corrugated feed theory and design: 
Refs. 1 (parts I and II)—-9 are among the most relevant. 
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Fig. 8. Corrugated circular waveguide geometry. 


The corrugated horn consists of two main sections, shown in Fig. 7: a mode- 
conversion or launcher section, and a flare section. The input of the launcher section 
is a circular waveguide with radius a;, and has a length of Zyc. The flare section, 
of length Lp, ends in the horn aperture with radius aj. In the case of a conical flare, 
the shape is defined by the half angle 9 and length Lp. The total length of the horn 
is L = Lyc + Lp. The figure also shows the nominal location of the phase center 
inside of the horn throat. The mode launcher section converts the fundamental 
mode TF}, of the input circular waveguide into the hybrid mode HE},,. Hybrid 
modes are not pure TE nor TM modes, as they have both E, and H, components. 
The dominant mode in a corrugated waveguide is the HE), mode. The walls of 
a corrugated circular waveguide have ridges or teeth characterized by a period or 
pitch p, and slots between the teeth of width w and depth d;, as shown in Fig. 8. 
The depth, width and pitch of the teeth in the corrugated walls behave as an 
impedance surface, transforming the input circular waveguide circular mode TE}, 


Feeds and Feed Systems 77 


Fig. 9. Left: Mode TE}, field at the input of the circular waveguide. Right: Hybrid mode H Ej; 
at the output of the horn. 


into the highly polarized hybrid mode H £,,. A comparison between the TF, and 
the HE, modes is shown in Fig. 9, Notice that the field power distribution of the 
hybrid HF\, mode is azimuthally symmetric, vanishing at the walls (in contrast 
with the TF; mode) and providing very good field symmetry between the E-plane 
and H-plane and very low cross-polarization in the radiated horn pattern. 

The mathematical form of the transversal field component of the HE), mode 
is expressed as? 


By =F [C1 Julep) — Cp XS In(kep) c08 26] +% Co XS o(ke p) sin24, (8) 
kay kay 
where Jo(ke p) and Jo(ke p) are the Bessel functions of the first kind, k?2 = k? — 6?, k 
is the free-space wave number, and (3 is the wave propagation constant. The symbols 
x and ¢ correspond to the wall impedance and admittance at the boundary p = aj, 
i.e. 
Pe 


X= FA C= zs * 


(4) 


As mentioned in Ref. 9, when the factor(x — ¢) = 0, the propagating field is linearly 
y-polarized, with no azimuthal dependence. But actually, both y and ¢ could be 
made zero independently: first, Eg will vanish if the waveguide is made with several 
corrugations per wavelength, and second, no axial current will flow (i-e. open circuit 
at p = ay) if the depth of the slots is close to A/4, and hence Hy will also be zero. 
This is referred to as the balanced hybrid condition.' 


1.2.1. Horn Design 


Table 1 shows the main design parameters of a corrugated horn.!? 


The three main types of mode launchers are shown in Fig. 10. The first is the 
variable depth launcher, which starts with an initial depth of \/2 at the input of the 
circular waveguide, tapering down to a slot depth of A/4 at the end of the launcher 
section. There is also the variable width-to-pitch launcher, which starts with very 


78 G. Cortes-Medellin 


Table 1. Design parameters for a corrugated horn. 


Input circular waveguide radius Qj 
Output feed horn radius ao 
Horn total length L 
Total number of slots N 


Length of mode conversion section Lc 
Number of slots in mode converter Nyc 


Slot pitch p 

Slot width w 

Slot width-to-pitch ratio w/p 

Depth of the jth slot dj, 7 =1,...,N 


Fig. 10. Mode launchers: Left: Variable depth. Center: Variable width-to-pitch. Right: 
Ring-loaded. 


thin slots, converging to the final slot at the end of the mode launcher, all with 
the same pitch value. For very wide bandwidths (more than 2:1), the ring-loaded 
launcher is used: it requires few slots to produce the transition from T Ej, to HE, 
with very wide input match.* Initial dimensions for the design of each of these 
launcher sections are given in Ref. 10, but optimization is required to obtain a good 
input match over the desired bandwidth. 

Techniques based in mode matching? are commonly used in this input match 
optimization because they are very fast compared with full-wave commercial soft- 
ware. Figure 11 shows the calculated E-field inside a corrugated horn using com- 
mercial software!! to visualize the mode conversion from the TE; mode at the 
horn input to the HE), hybrid mode in a variable width-to-pitch launcher section. 

Figure 12 shows the calculated far-field radiation pattern of this corrugated 
S-band horn design at 3.8 GHz. The calculated antenna gain is 22.6 dBi with a 
HPBW of 14°. The pattern has very good symmetry between EF and H planes and 
the very low cross-polarization typical of this type of corrugated horn. 

Figures 13 and 14 show examples of corrugated feed horn designs used in the 
Sardinia Radio Telescope. Figure 13 shows the C-band (6.7 GHz) feed horn for the 
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Fig. 11. Calculated E-field inside a corrugated horn using commercial software (CST). The TFi1 
mode at the input of the circular waveguide is converted into the hybrid mode HF, at the end 
of the launcher section and propagated to the output of the horn. 
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Fig. 12. Calculated far-field pattern of a corrugated horn using commercial software (CST), 
indicating the co-polar and cross-polar components at 3.8 GHz. 


beam wave guide focus of the Sardinia Radio Telescope.!? The feeds are designed 
for an edge taper of —12 dB at 20°. Figure 14 shows the K-band (18-26 GHz) multi- 
feed receiver system for the Gregorian focus of the Sardinia Radio Telescope.!? The 
edge taper angle is 12°. 
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Fig. 13. C-band corrugated conical feed horn (6.7 GHz) for the Sardinia Radio Telescope.!? Used 
by permission. 


Fig. 14. K-band multi-feed receiver for the beam wave guide focus of the Sardinia Radio Tele- 
scope.!? Used by permission. 


The corrugations do not necessarily need to be radial; axial corrugations will 
also work, and in general they are easier to manufacture. Figure 15 shows a 1.0— 
2.0-GHz wide-angle corrugated feed horn with axial corrugations and a ring-loaded 
launcher section designed at Cornell University for the L-band wide receiver at the 
Gregorian focus of the 305-m Arecibo Radio Telescope. The edge taper is —15 dB 
at 60°. Figure 16 shows the calculated far-field radiation patterns cuts for this horn 
at 1.0 GHz and 1.8 GHz. 


Feeds and Feed Systems 


81 


Fig. 15. L-band ring-loaded corrugated conical horn designed at Cornell University for the 
Arecibo Observatory, with a bandwidth of 2:1. The horn is mounted in an inside antenna range 
to measure the antenna patterns. (Courtesy of Cornell University and the Arecibo Observatory.) 
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Fig. 16. Calculated radiation pattern cuts of the L-band ring-loaded corrugated conical horn at 


1.0 GHz (left) and 1.8 GHz (right). Pattern cuts are at 0°, 45°, 90° and 135°. (Courtesy of Cornell 
University and the Arecibo Observatory.) 


1.2.2. Profiled Corrugated Horns 


By modifying the flare profile is possible to design more compact profiled corrugated 
horns.” !3 The phase center of a profiled horn could be designed to be at the aperture 
of the horn, allowing for a wideband response with minimum or no phase center 
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variation with frequency.'+ A variety of profiles could be used for the flare section in 
addition to the conical lineal-flare. In Table 2 we present a few of these from Ref. 10. 
The sin” profile introduced by Ref. 15 works well in combination with a Gaussian 


14,16 including splines.!7 


profile. Different profile combinations are possible, 
Examples of corrugated feed horns with different profiles are shown in Figs. 17 


and 18. The left panel of Fig. 17 shows the EVLA’s L-band profiled corrugated 


Table 2. Some profile options for the flare section of a corrugated feed horn.1° 


Profile Formulation 
Linear a(z) = a; + (ao —a;) + 
Lr 
2,2 — 2 
Hyperbolic a(z) = ,/a? + . (fo, oe) 
F 
Sinusoid a(z) = ai + (@o — ai) |(1— x) & +x sin? | 24 », O<x<l 
Toa Le 
Tangential a(z) = a; + (ao — aj) |(1—x) > +y tan? ( 44 », O<x<l 
Lr 4Lp 
2Li(do—ai) 2 ( TZ ) 
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Fig. 17. Left: EVLA L-band profiled corrugated feed horn mounted on one of the VLA antennas. 
Right: K-band profiled horn of the 18-27.5 GHz seven pixel focal plans array receiver of the Green 
Bank Telescope. (Courtesy of the National Radio Astronomy Observatory.) 
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Fig. 18. Profiled corrugated feed horns used in Planck’s High Frequency Instrument (HFI).!9 


feed horn mounted in one of the VLA antennas. This is a large feed horn when 
compared with the crew on top of the dish in the photo, and also when compared 
with profiled corrugated feed horn in the right panel, part of the seven pixel K-band 
focal plane array receiver of the Green Bank Telescope, covering a frequency band 
from 18 to 27.5 GHz. Figure 18 shows some of the profiled corrugated feed horns 
used in Planck’s High Frequency Instrument (HFT).'® 


1.2.3. Manufacture 


Fabrication of corrugated feed horns at cm and mm wavelengths is realized using 
conventional CNC machines. Special care has to be taken to ensure that the horn is 
axially symmetric during the machining process; otherwise, high cross-polarization 
values could result. This corrugated horn manufacture technique could be extended 
to sub-mm wavelengths by using electroforming, in which a mandrel or negative 
of the feed horn is manufactured on an aluminum block using a lathe or a CNC 
machine. Once the mandrel is machined, it is submerged inside an electroplating 
bath to deposit copper over the mandrel. After the process is completed, the alu- 
minum mandrel is dissolved, releasing the final corrugated horn. Figure 19 shows 
the initial mandrel used to electroform a 350-GHz profiled corrugated feed horn. 
Another technique suitable, in particular, for horn arrays at mm wavelengths, is 
the use of platelets??: a stack of thin metal plates perforated with the cross-section 
holes of a corrugated feed array. Examples of this are the 91-element and 19-element 
corrugated feed horn arrays at W-band and Q-band, respectively, for the QUIET 
experiment,?! or the 100-GHz platelet feed horn array for the BLAST experiment.?? 
At sub-mm wavelengths, this technique has been extended using silicon platelets.?? 
In essence the method starts with a stack of thin silicon wafers. Using a photolitho- 
graphic process based on Deep Reactive Ion Etching (DRIE) of silicon, the axial 
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Fig. 19. Machined mandrel for a 350-GHz profiled corrugated feed horn (courtesy of Rutherford 
Appleton Laboratory, UK). 
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Fig. 20. Silicon micro-machined feed horn array platelets for the Cosmic Microwave Background 
Polarimetry experiment.?4 Used by permission. 


cross-section cut-out holes of the array’s corrugated horns are carved on each indi- 
vidual wafer. After the Si DRIE process is completed, each platelet is individually 
metalized. The platelets are then aligned and bonded together. Finally, the whole 
stack is electroplated. Figure 20 shows the platelet concept used to fabricate an 
array of 84 feed horns at 150 GHz for the Cosmic Microwave Background Polarime- 


try experiment.?4 


1.2.4. Smooth-Walled Feed Horns 


An alternative to corrugated feed horns is the smooth-walled horn, which does not 
have corrugations (see, for example, Ref. 25). In this type of horn designs an iterative 
mode matching technique is used to produce a profile that matches the desired horn 
radiation pattern, cross-polarization level, beam Gaussianity, and input match over 
a particular frequency band. 

A comparison between a corrugated design and a splined profile smooth-walled 
design for a 80-120-GHz feed horn is presented in Ref. 26. The authors show 
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that although the smooth-walled design has high cross-polarization (still less than 
—25 dB) at the edges of the band, it does show a more invariant phase center 
location with frequency and has slightly better pattern symmetry. Zeng et al.” 
demonstrated that a 25% bandwidth could be achieved with further optimization. 

Figure 21 shows an example of the calculated 2D E-field cut inside a spline 
smooth-walled feed horn geometry scaled from 100 to 450 GHz.?° Figure 22 shows 
the calculated far-field pattern cuts of this horn at 450 GHz. The —10 dB half beam 
width is 26° at this frequency. 

Figure 23 shows an example of fabrication of a spline feed horn design for 
the ALMA Band-7 feed (275-373 GHz) by E. Lauria of the University of Arizona, 
based on Ref. 26, with the electroform manufacture done by the National Radio 
Astronomy Observatory. The left panel in the figure shows the aluminum mandrel 
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Fig. 21. Calculated 2D E-field inside a smooth-walled splined feed hornscaled to 450 GHz, from 
Ref. 26. 
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Fig. 22. Calculated far-field radiation pattern cuts of the splined smooth-walled feed horn at 
450 GHz. 


86 G. Cortes-Medellin 


Fig. 23. ALMA Band-7 (275-373 GHz) spline horn manufacture. Left: Aluminum mandrel with 
initial gold plating (middle), copper electroformed over gold (initial coat, top), and final gold plat- 
ing (bottom). Right: Five completed spline feed horns next to a quarter. (Design by G. Lauria 
University of Arizona, fabrication by the National Radio Astronomy Observatory. Used by 
permission. ) 
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Fig. 24. Calculated 2D E-field inside a smooth-walled sin?+ Gaussian flare feed horn at 
450 GHz.?8 


in several stages of fabrication; it is first gold-plated to ensure a uniform gold-plating 
of the horn interior (middle of the panel) and then copper electroformed. After the 
electroforming is completed and the aluminum mandrel is removed, the outside 
copper is machined to add an M6 x 0.5 mm thread (shown in the picture) to attach 
the flange, and then electroplated with a final gold coat. The right panel of Fig. 23 
shows five finished horns placed alongside a quarter for size comparison. 

Figure 24 shows the calculated 2D E-field cut inside a smooth-walled feed horn 
geometry that uses an optimized profiled based on a sin? + Gaussian flare section?® 
at 450 GHz. The horn dimensions are 46 mm long x 12 mm aperture. Figure 25 
shows the corresponding calculated far-field pattern cuts of this horn. The —10 dB 
half beam width is 3.5° at this frequency, with excellent cross-polarization (less than 
—30 dB inside the beam). 

All these feed horns are characterized by profiles with smooth curves and there- 
fore they require lengthy machining processes with sub-mm precision. In the case 
of sub-mm feed horn arrays it becomes an even more expensive process. 
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Fig. 25. Calculated far-field radiation pattern cuts of the sin?+ Gaussian smooth-walled feed 
horn at 450 GHz.?8 
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Fig. 26. Calculated 2D E-field inside a smooth-walled three section multiple flare angle feed horn 
at 470 GHz. 


An alternative is to use a cutting drill bit with a few straight edges to drill 
out a multi-section flare angle horn with the flare angles and transitions points 
carefully optimized.” Figure 26 shows an example of the calculated 2D E-field map 
at 470 GHz inside such a feed horn with three flare angle sections. Figure 27 shows 
the corresponding far-field pattern cuts of this multiple flare angle horn at the same 
frequency. The pattern has a —10-dB half beam width of 13°, very low sidelobe 
levels and excellent cross-polarization. 

Pickett—Potter feed horns®®:?! are a very good option for narrowband appli- 
cations, in particular at sub-mm wavelengths, in which they provide several GHz 
of instantaneous bandwidth.*? Based on the correct balance between the circular 
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Fig. 27. Calculated far-field radiation pattern cuts of the three section multiple flare angle feed 
horn design at 470 GHz. 


waveguide modes TF; and TM; at the aperture of the horn, the horn produces 
a pattern with very low sidelobe levels and good match for a Gaussian beam. 
Finally, we have to include the diagonal horn. Introduced first by Ref. 33, it is 
characterized by symmetric E-plane and H-plane patterns and is relatively easy to 
machine, but suffers from very high cross-polarization.*+ It has been used in mm 
and sub-mm wavelengths.*° A comparison between corrugated, smooth-walled and 
diagonal horns is made by Ref. 36. Figure 28 presents an example of a diagonal 
horn design (by the Author) at 470 GHz. The top panel shows the characteristic 
field distribution at the aperture of the diagonal horn, the middle panel shows the 
calculated field distribution in the interior of the feed horn at 470 GHz and the 
bottom panel shows the calculated far-field pattern cuts at 470 GHz. The high 
cross-polarization values at 45° angle are characteristic of this type of horn. 


2. Coherent Focal Plane Cameras: Multi-Pixel and Phased Array 
Feed Cameras 


For a long time radio telescopes used to be single pixel instruments, requiring long, 
methodic and patient hundreds of hours of telescope pointings and post processing 
in order to produce a map of a very small patch of the sky in radio waves, at a 
time when optical (and later infrared) telescopes were capable of producing large 
multi-spectral maps of large swaths of the sky. This was enabled in the optical and 
IR wavelengths by large format CCD cameras with initially hundreds of thousands, 
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Fig. 28. These figures show a 470-GHz diagonal feed horn design (by the Author). The top panel 
shows the field distribution characteristic of a signal horn, the middle panel shows the E-field 
distribution in the interior of the horn, and the lower panel shows the calculated far-field radiation 
pattern cuts (Co-Polar and Cross-Polar) at 470 GHz. 


and later millions, of pixels. Currently, cameras with thousands of bolometer pixels 
are becoming more commonplace at sub-millimeter wavelengths. 

These cameras are direct detection instruments, in contrast with coherent 
detection instruments used more commonly in radio astronomical telescopes, which 
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record not only the amplitude but also the phase of the incoming astronomical elec- 
tromagnetic field. This per se imposes additional technological constraints to the 
development of large format coherent multi-pixel cameras and, more recently, cryo- 
genic phased array feed (PAF) cameras. The pursuit of phased array technologies, 
which enable instantaneous access to the field of view (FoV) of a radio telescope’s 
optics, is what drives this effort within the radio astronomical community to improve 
what is expressed ultimately as survey speed. 


2.1. Survey Speed 


Survey speed is a figure of merit whose definition depends on how a particular survey 
37-39) For constant 
integration time, and assuming a total number N,, of instantaneous non-overlapping 


is conducted (i.e. constant integration time vs. transient surveys 


formed beams by an array of detectors (either from a number of detectors in a single 
telescope or multiple telescopes), the survey speed is proportional to 


Ae \” 
SVS ~ No Quppw BW (= ) ; (5) 
sys 
where Q,,,5y is the halfpower beam solid angle of a single detector beam in the sky, 


BW is the system bandwidth (assuming continuum observation), A¢ is the effective 
aperture area of the telescope, and Ty,; is the system temperature. The product 
Ng Quppw is the effective field of view (FoV) of the telescope. Equation (5) indi- 
cates that in order to increase the survey speed we need to increase either the 
number of beams, the bandwidth, or the telescope effective aperture, or to decrease 
the overall system temperature. It is important to notice that the survey speed 
depends quadratically on the telescope sensitivity, An /Tsys. Therefore, large survey 
speeds are available at large aperture telescopes with very low system temperature 
detectors. 


2.2. Multi-pixel Array Camera vs. Phased Array Camera 


There are two approaches to increase the effective FoV of the telescope with coherent 
detectors: one is the use of a multi-pixel array camera, and the second is a phased 
array feed (PAF) camera. 

Examples of early generation of cryogenic array cameras in radio astronomy 
include the QUARRY,*°*! a 15element cryogenic array receiver developed at the 
Five College Radio Astronomy Observatory (FCRAO). This camera allows for spec- 
tral line observations of the 115.8 GHz CO line transition using a 14-m radio tele- 
scope over the frequency range 86-115 GHz. Low frequency examples include the 
Parkes multibeam feed at the Parkes Observatory in Australia and, more recently, 
the seven pixel ALFA camera‘? at the Arecibo Observatory (see Fig. 29). Both 
instruments work in the L-band, centered at the 1.4 GHz HI line transition. 

The operation of these cameras depends on the plate scale or image scale of 
the telescope, which relates the location of the telescope beam on the sky to the 
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Fig. 29. Left: Arecibo L-band array camera (ALFA) (image courtesy of NAIC/Arecibo Observa- 
tory). Right: Parkes multibeam feed (image courtesy of Parkes Observatory, ATNF/CSIRO). 


location of the feed or pixel in the focal plane of the telescope. The telescope image 
scale is given by 


IMG, = 


1 — (6) 


DF, 


mim 


where D is the aperture diameter of the radio telescope (in mm) and F’, is the 
telescope system focal ratio. Therefore, the beam separation in the sky between two 
adjacent feeds in the focal plane separated by a distance As is given by 


Asky = IMG, x As. (7) 
The telescope system focal ratio can be approximated by the following expression, 
using the equivalent parabola of the optics: 
f 1 1 


=— =f, So ——_ SS , (8) 
D ? 4 tan 26, 


where @, is the edge illumination angle of the optics, as in Eq. (1). This angle 
determines also the feed horn beamwidth, 0,, that properly matches the optics of 
the telescope (maximizing aperture efficiency and minimizing spillover). 0,, for a 
circular aperture feed diameter of dp, is given by 


Os — 20, = K Taper - (9) 
F 


where K Taper is a constant factor that depends on the intended edge taper illumina- 
tion at @,,. As a comparison, for a —3-dB edge taper, K Taper = 1.02, and for a more 
typical Gaussian edge taper of —15 dB, which also provides near optimum spillover 
efficiency, K Taper = 2.5. We will use this last value to calculate the beam separation 
in the sky for two adjacent feed horns. 
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Let us assume that the separation of the feed horns in the focal plane is 
As = dp + twats. Neglecting the feed horn’s wall thickness (twas ¥ 0), we obtain 


Xr 
As=dpr = Taper "ae. = KTaper AF’, . (10) 
F 
Therefore, 
Xr 
Asky = IMG, x As = 2.5 DB (11) 


For Nyquist sampling, we would like that the beams be separated by 


Xr 
ARayleigh = 1.22 D? (12) 

and in this case, the beam separation in the sky is 
Asky = 2.05ARayleigh- (13) 


Figure 30 shows a typical example of this separation effect for Arecibo’s 7-pixel 
ALFA camera. The left panel shows the layout separation and diameter of the seven 
L-band circular waveguide feed horns. The right panel shows the corresponding sky 
footprint of ALFA’s beams, with a HPBW separation of 2.5\/D at 1.375 GHz. 
The sky footprint of a multi-pixel array camera using coherent “feed horn” 
detectors always present gaps for a Nyquist-sampled map, which need to be filled 
in subsequent telescope pointing passes. The reason, of course, is that the feed 


Sky Area 25’ x 25 


Fig. 30. Left: Focal plane geometry of Arecibo’s ALFA camera 7-feed horn layout, showing feed 
horn separation and diameter. Right: Sky footprint of ALFA’s beams with a HPBW separation 
of 2.5\/D at 1.375 GHz. The Arecibo telescope optics has a spherical main reflector and an 
asymmetric Gregorian corrector. While the corrector compensates for the spherical aberration at 
the single focal point, it has residual field distortion for other points in the focal plane that results 
in a beam with an elongated sky footprint. 
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horns have finite aperture diameter and cannot be placed physically closer than 
that diameter. 

A possible alternative is to use re-imaging optics to create an intermediate 
virtual focal point in which the image of the feed horn aperture fields overlap, and 
place it into the telescope focal point. This approach has been done successfully by 
Ref. 43. 

It is fair to say that we have gaps because we chose the larger taper factor 
of 2.5, and this is true. The reason for doing so was to maximize the sensitivity 
(signal-to-noise ratio) of the receiver system by optimizing the illumination of each 
individual pixel, thereby minimizing spillover. If we choose a smaller taper factor, i.e. 
—3 dB, then the spillover will increase the thermal noise entering into the receiver, 
substantially reducing the telescope sensitivity. 

On the other hand, a larger element spillover implies that the beams of adjacent 
array elements will observe, in part, the same background noise, and therefore are no 
longer uncorrelated. By a smart combination of the signals of these array elements 
into a phased array, we can minimize these noise contributions, hence recovering 
the sensitivity. This brings us to the phased-array feed (PAF) radio camera. 


2.3. The PAF: Theory of Operation 


The PAF camera consists of an array of antenna elements packed closely together 
(As ~ 0.5 to 0.7A typically) in the focal plane of the telescope optics. The array 
elements sample the incident electromagnetic (EM) wavefront that is focused by the 
telescope optics onto the focal plane. This field originates from a source in the sky 
in the direction @, as shown in Fig. 31. The vector of PAF output signals, X(w), is 


Steering Vector — Source flux 


ww th X(w) = ae s(w) + n(w) 


Noise 
Vector 


Beam, 


Ye(w) = wi (w) -K(w) 
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Fig. 31. Theory of operation of a PAF radio camera. 
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expressed in the frequency domain as 
X(w) = ag s(w) + n(w), (14) 


where ag is the array steering vector associated with a source flux density s(w) in 
the direction @ in the sky. n(w) is the background array noise (vector), also captured 
by the array elements. By adjusting these complex weights we can steer and shape 
the beam in the sky; this is called beam forming. Figure 32 shows an example of the 
sky coverage for a 5 x 4 array of single feeds compared to the coverage produced by 
a phased array covering a similar sky area with the same single-feed beam size. 

The PAF linearly combines these array input signals with a vector of complex 
weighting factors wy, to form a beamg, i.e. 


Ye(w) = wy (w) - X(w). (15) 


Ultimately we are interested in the power spectral density function (PSD), Sy, (w), 
for each beam; formed: 


Sy, (w) = E{Ys(w) Ya(w)"F, (16) 
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Fig. 32. Survey map comparison of sky footprint with focal plane multi-pixel (coherent) array 
vs. a Phased Array Feed (PAF) camera covering a field of similar size (background map from 408 
MHz all Sky Survey of ). 
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Sy, (w) = wy (w) Retn(w) we(w), (17) 

where the matrix R.4n(w) is given by 
Retn(w) = E{Xe(w) Xp ()}. (18) 


An important characteristic of the PAF is its Noise Correlation matrix response, 
in absence of a source signal, i.e. 


R,(w) = B{X(w) X*() }lswy=0 = E{n(w) n*(w)}- (19) 
The noise power spectral density function is therefore 
Sy (w) = wy (w) Rn(w) we (w). (20) 
In general 
Ry = Rsp1 + Rsky + Rioss + Rrec; (21) 


where the different noise contributions include spillover, Rsp1, sky noise, Rsky, radio 
telescope antenna ohmic losses, Rjoss, and receiver noise, Ryrec. 
Likewise, for a source-dominated field (n(w) ~ 0) we define 


Sx (w) = wi (w) Rs(w) we(w), (22) 
where 
Rs(w) = E{Xx(w) Xj, (wv) }ln=o - (23) 
The signal-to-noise ratio (SNR) associated with the kth formed beam is given by 


wi R, wy 


SNR, = (24) 


? 
wit Ry we 


where we have dropped the w argument to make the expression more readable. 
Therefore, the telescope sensitivity with a PAF camera for the formed k*’-beam is 


A. | _ kp BW wi R, we 
Tyas k — S wit R, Wh 


(25) 


where kg is the Boltzmann’s constant, BW is the bandwidth, and S is the target 
sky source flux. 
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Naturally, it is of great interest to obtain the optimum set of complex weights, 
w, that maximize SNR; originally discussed by Refs. 45 and 46, these are given by 
Ref. 47 as 


w(w) = RZ '(w) ao. (26) 


It is possible to apply additional constraints to the formed beams during the 
SNR optimization, producing symmetric beams, sidelobe level control, and cross- 
polarization, among other characteristics, with minimum impact to the SNR per- 


formance.*®: 49 


2.4. PAF Calibration 


Before the PAF camera can be used it is necessary to calibrate it. A complete 
PAF calibration is done on-the-reflector in order to include all the on- and off-axis 
signal-to-noise characteristics of the telescope focal plane optics. This could be 
also complemented by on-the-ground PAF calibration, using aperture array test 
facilities. 

Figure 33 (Left) shows the commonly used on-the-reflector method for PAF 
calibration. It requires creation of a scan grid of telescope pointings (with the PAF 
camera) surrounding a bright point source of known flux, typically a quasar, which 
provides the on-source correlation matrix, R,. The scan grid should completely 
cover the PAF’s available FoV. The off-source, or noise correlation, matrix Ry, is 
obtained by pointing the telescope at a quiet, dark sky patch away from the point 
source. 
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Fig. 33. Left: PAF signal-to-noise ratio calibration. Right: FoV of 30 x 30 arcmin? of the M87 
captured with nine single pointings with the Arecibo AO19 Cryo-PAF prototype calibrated with 
an available FoV of 5 x 5 arcmin?. See electronic edition for a color version of this figure. 
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This calibration grid is done once*; therefore, it is important to avoid any gaps 
in the grid. Even though it may take an hour or two, depending on the telescope 
sensitivity and source flux (which is why it is important to have a large collecting 
aperture telescope that provides good sensitivity), once the PAF radio camera is 
calibrated, it only requires a single pointing to capture all the available PAF FoV all 
at once, with no gaps. The right panel of Fig. 33 shows a FoV of 30 arcmin? around 
M87, requiring only nine telescope pointings with a cryogenic PAF prototype AO19 
Cryo-PAF, calibrated with an available FoV of just 5 x 5 arcmin?.°? 

It is possible to combine this previous procedure with an on-reflector noise 
calibration source®! placed in front of the PAF, which works best for prime focus 
PAF locations looking into the main reflector. In this case, the calibration noise 
source is located at the reflector vertex, at sufficient distance in order to produce 
an incident plane wave illuminating the PAF. A first step is to form a diagonal 
calibration matrix, D, of the form 


d; = Scal,j /Scal,*; (27) 


where S¢ajl,j is the power received by the PAF port j, and s¢ai,. is the calibration 
signal received by a particular port in the PAF. Then max SNR weights are cal- 
culated using the procedure we outlined above, and one must proceed to pre- and 
post-multiply the signal and noise correlation matrices by D*~' and D~!, respec- 
tively, in order to refer the weights with respect to the calibration noise source on 
the reflector.>! 

Measuring the PAF noise performance on an aperture array test facility is 
achieved by using the hot/cold power method, with the PAF looking at a large 
enough absorber load at a physical temperature J),, and then looking directly at the 
sky. A broadband calibration source is normally placed on the absorber platform at 
a sufficiently large distance from the array to form, again, a plane wave to calibrate 
the weights in order to steer the formed beam to the zenith during calibration.°? 
This process allows one to obtain the PAF aperture array noise correlation matrix.” 
Nevertheless, from the aperture array noise correlation matrix just obtained, we 
could approximate the complex weights that configure the PAF for max SNR when 
installed in the focal plane of a reflector, as the relation between the PAF configured 
as an aperture array, receiving a plane wave, and a PAF configured for an incoming 
focal field on the telescope optics can be expressed as 


PAF — oe, Saas (28) 


*The calibration factors could hold from a few hours to a few days depending on the electronic 
drift of the telescope instruments. 

bUnder these conditions, antenna spillover, ohmic losses or aberrations from the telescope optics 
are not included in the on-the-ground aperture array noise correlation matrix. 
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where C is a diagonal matrix. Then we pre- and post-multiply the aperture array 
hot and cold correlation matrices by C*~' and C~!, respectively, before they are 
used to obtain the on-the-telescope max SNR weights. 


2.5. PAF Implementation 


The implementation of a high sensitivity phased array camera for radio astro- 
nomical applications requires the seamless integration of several key technologies, 
namely, cryo-mechanics, vacuum technologies, electromagnetics, microwave and 
radio receiver technologies, and digital signal processing. As a frame of reference, a 
high sensitivity single pixel feed radio astronomical receiver typically operates with 
system temperatures on the order of Tsys ~ 25K at L-band; therefore, PAFs are 
expected to have at least comparable beam-formed systems temperatures (Tyysp,); 
to be considered as a viable alternative, since its sensitivity is inversely proportional 
to (1/Teyspe)*- 

The realized beam-formed system temperature Tsyspz, depends on the set of com- 
plex weights used by the digital beam former as it combines the receiver noise 
(T;,x,) and the antenna temperature (T, ) of each individual channel (7) in the 
array. The antenna temperature, T’, , is the equivalent noise power temperature at 
the terminals of the 7th array element when the array is pointing into the optics 
of the telescope. T',, has noise contributions from three sources: the available FoV 
background noise, the spillover noise past the reflector optics of the telescope, and 
the noise coupled to the ith channel from the adjacent elements in the array. 

On the other hand, the contribution of the receiver temperature, T;,,,, to Tsyspr 
depends more strongly on the actual physical temperature of the LNA chain and 
ohmic losses in front of it. Hence, the highest sensitivity radio astronomy applica- 
tions require that the LNAs operate at cryogenic temperatures, typically at around 
15K, producing an equivalent receiver noise temperature of the order of 5K at 
L-band. Further reduction of system noise is obtained when the front-end array ele- 
ments are cooled down to cryogenic temperatures, thus minimizing their self-emitted 
thermal noise. Alternatively, if only the LNAs are cooled to cryogenic temperatures, 
a very careful cryogenic design of the vacuum RF feed through connection with the 
array elements is needed to reduce the increased noise temperature produced by 
the room temperature ohmic losses of the array elements and connections in front 
of the LNA stage. 

In the radio camera PAF implementation we can identify three major compo- 
nents: the front end, the RF receiver system and the back end, as shown in Fig. 34. 


2.5.1. Front End 


The PAF radio camera front end consists of the phased array antenna elements; 
cryogenically cooled low-noise amplifiers (LNAs), one for each antenna element and 
polarization state; and a post-amplifier section, normally at room temperature. 
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Fig. 34. System architecture of PAF radio camera. 
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Fig. 35. Front-end architecture of a single array element of the cryogenically cooled 19-element 
phased array feed (AO19) developed at the Cornell Center for Astrophysics and Planetary Sciences 
(CCAPS), Cornell University, for the Arecibo Radio Telescope. 


Figure 35 shows the front-end architecture for a single array element of the dual 
polarized 19-element phased array feed (AO19) camera developed at the Cornell 
Center for Astrophysics and Planetary Sciences (CCAPS) for the Arecibo Radio 
Telescope. Each dipole has two cryogenically cooled (15 K) LNAs, one for each polar- 
ization, for a total of 38 channels, inside of a cryostat with a two-stage refrigerator. 
Outside the cryostat, we have a filter/post amplifier section at room temperature, 
followed by the receiver system. In this configuration, the PAF array elements are 
also cryogenically cooled (~18K) to minimize the receiver noise temperature. 
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The entire front end (phased array elements and LNAs) must be placed inside 
a cryostat and cryogenically cooled under vacuum for maximum sensitivity. This 
creates an engineering challenge: as the cryostat is under vacuum, the top cover 
of the array elements must be able to support 14 psi (the equivalent of 7.7 t/m?) 
of atmospheric pressure, yet must be transparent to radio waves (at L-band) but 
opaque at IR wavelengths. This problem was solved*® by the use of a radio transpar- 
ent vacuum window support (closed-cell) foam material called ROHACELL, with a 
yield strength of more than 250 psi and a precise combination of structural, thermal 
and electromagnetic characteristics, covered by a top-hat high-density polyethylene 
(HDPE) dome that serves as an RF-transparent vacuum window (see Fig. 36). 
A photograph of the ROHACELL foam layer is shown in Fig. 37. 


2.5.2. Phased Array Elements 


Phased array feed elements for radio astronomical applications are typically low 
gain antenna elements: i.e. dual polarization sleeve dipoles, with an optimized band- 
width ratio of 1.5:1, or single polarization elements (Vivaldi elements, or checker- 
board elements), capable of operating within a 2:1 frequency band. Figure 37 
shows a wideband dual-polarized sleeve-dipole based phased array feed used in the 
19-element cryo-PAF (AO19) developed at CCAPS for the Arecibo Radio Telescope, 
with the array elements cryogenically cooled.®° The picture on the left shows the 
open cryostat with the hexagonal layout dipole array nestled in the bottom layer 
of ROHACELL foam. The picture on the right shows the combined dipole/LNA 
field-replaceable package developed for the AO19. It includes a fully detachable 
dipole/LNA cylinder thermally connected to the 15K second stage through a cold 
finger receptacle. A nylon brace (not shown) is used to compress the cold-finger base 


Filter/Foam 


; Dipoles 18K 


Ground Plane 80K 


Lower Shield 


Fig. 36. Left: Cryogenically cooled 19-element phased array feed (AO19) camera with a top-hat 
high-density polyethylene (HDPE) dome developed at the CCAPS for the Arecibo Radio Telescope. 
Right: Cross-section of the AO19 cryo-PAF camera indicating the main components. 
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Fig. 37. Left: Bottom ROHACELL foam layer of the dual polarized, cryogenically cooled 19- 
element PAF (AO19) developed at the CCAPS for the Arecibo Radio Telescope. (Courtesy of 
CCAPS). Right: Dipole/LNA field replaceable package used in the Cryo-PAF AO19. 


Fig. 38. Left: NRAO’s 19-element phased array feed (FLAG) developed by the NRAO CDL 
in collaboration with Brigham Young University; dipoles at room temperature (courtesy of 
NRAO/AUI/NSF). Center: Mark-II 188 checkerboard-element room temperature PAF, developed 
at CSIRO for ASKAP (courtesy of CSIRO).5? Right: The Advanced Focal Array Demonstrator 
(AFAD), a PAF with 41 room temperature single polarization elements based on thick Vivaldi 
elements developed at the National Research Council (NRC)’s Dominion Radio Astronomy Obser- 
vatory (credit: NRC Herzberg, Crown Copyright). All figures used by permission. 


around the LNA cylinder at cryo temperatures, ensuring a very reliable thermal 
contact. 

The left panel of Fig. 38 shows NRAO’s 19-element phased array feed (FLAG) 
developed for the Green Bank Telescope by the NRAO Central Development Lab- 
oratory (CDL) in collaboration with Brigham Young University,°? which has room 
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temperature dipoles that are shape-optimized to minimize noise coupling between 
elements.°*:°° The center panel shows the Mark-II room temperature PAF with 188 
checkerboard-based elements, developed at the Australian Commonwealth Scientific 
and Industrial Research Organisation (CSIRO) for the Australian Square Kilome- 
tre Array Pathfinder (ASKAP). The right-hand panel shows the Advanced Focal 
Array Demonstrator (AFAD), a room temperature PAF with 41 single polariza- 
tion elements based on thick Vivaldi elements, developed at the Canadian National 
Research Council (NRC)’s Dominion Radio Astronomy Observatory for the Square 
Kilometer Array (SKA).°° 


2.5.3. RF Receiver System 


The receiver systems of modern PAF's have two possible architectures: an Interme- 
diate Frequency (IF) downconversion section and subsequent A/D conversion, or 
a direct Software Defined Radio Receiver (SDR) with a high rate A/D conversion. 
The IF mixing requires a stable local oscillator and a quadrature mixing (I/Q) to 
preserve the phase. In practice the RF/IF receiver followed by A/D conversion is the 
most economical solution. The second (SDR) option converts the RF band directly 
into the digital domain, and even makes use of a fully digital I/Q mixing. Figure 39 
shows these two implementations for a single channel; the SDR implementation 
includes a digital polyphase filterbank® for frequency decimation/channelization 
(F-engine), which is already part of the initial signal processing of the back-end 
digital beam former, and is implemented in the same hardware board next to the 
A/D conversion section. 


Analog Domain Digital Domain Digital Domain SDR 


PFI 


To Beam 
Former 


To Beam 
Former 


RF RF 


To Beam 
) Former 


To Beam 
Former 


Oe 
IF/ 1Q Down-conversion Digital 1/Q Down-Conversion 

Fig. 39. Heterodyne RF receiver section (one per channel): (Left), implemented with an hetero- 

dyne receiver with I/Q down-conversion and digital conversion, and (Right), implemented with a 


Software Defined Radio Receiver (SDR). These functional blocks include a polyphase filterbank 
(PFB) section or F-engine. 


“See Chapter 7 of this volume for a detailed discussion of polyphase filterbanks. 
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2.5.4. Back-End: Digital Beam Former 


The digital receiver back end consists of two main subsystems: the sam- 
pler/frequency channelizer (or F-engine), mentioned earlier as part of the SDR 
receiver, and the correlator/beamformer (or XB-engine). Figure 40 illustrates the 
functional block configuration of the XB-engine for the Cornell/BYU ALPACA 
cryogenic PAF camera for the Arecibo Observatory.°” These blocks are implemented 
as massively parallel digital processing systems. 

The F-engine channelizes each of the antenna element polarization signals 
into M-channels: each complex vector, x,, contains the ji-channel signals of all 
the antenna element polarizations. The channelization is necessary to construct 
wideband phasing array weights. Between the F-engine and XB-engine sits a very 
high performance network switch array (not shown) that reorders the data pack- 
ages for each antenna input in parallel order to frequency channel order for the 
XB-engine to operate. The correlation matrix, R,,, for each channel is constructed in 
the XB-engine, and a real-time formed beam, py, can now be synthesized for a par- 
ticular k-beam by the Hermitian inner product between the complex weight vector, 
wih), and the signal, x,,. The signal then goes to a fast integrator/coarse spectrom- 
eter for fast transient searches (typically 300-400 kHz). Fine spectral resolution 
is achieved by a second polyphase filterbank section with thousands of channels 
(~25600). Finally, the signal goes to a long-term integrator/fine spectrometer with 
10 kHz resolution for HI searches in external galaxies. The hardware architecture 
makes use of a combination of FPGA processor boards for sampling and frequency 
channelization and high performance GPU parallel processing for the XB-engine 


|x,,[n]) | Real-time Beamformer : Coarse Spectrometer 
B iebearn) Fast-dump Integrator 
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Fig. 40. Digital Correlator/Beam Former (XB-engine) Functional Blocks for ALPACA. 
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implementation. The hybrid FPGA/GPU approach to astronomical array signal 
processing has been used with great success; see for example, PAPER,°® CHIME,°? 
VEGAS Spectrometer,®° and FLAG PAF for the GBT.°? 
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This chapter describes some of the critical building blocks of common radio 
astronomy receivers, such as LNAs, mixers, filters, analog-to-digital converters, 
and the signal processing algorithms that augment them. Fundamental concepts 
in receiver design are discussed as well as several new concepts that are now 
beginning to transform the way radio astrophysical instruments are built. 


1. Introduction 


The architecture of receivers used in radio astronomy is undergoing a gradual but 
significant evolution from subsystems based mainly on analog components (induc- 
tors, capacitors, resistors, diodes, transistors, etc.) to digital components such as 
those that make up the heart of a personal computer. There will always be a need 
for analog components that amplify the weak signals from the cosmos while adding 
a minimum of electronic noise and for limiting the frequency range of signals that 
must be processed in downstream electronics. However, the stability and accuracy 
of digital arithmetic and the rich toolbox of mathematical functions that digital 
components make available are not only replacing analog functions, but adding 
new signal processing functions and changing the architecture of radio astronomy 
receivers. As a result, the distinctions between front-end, receiver, and back-end of 
a radio telescope are becoming less clear and possibly less useful. 

The evolution of radio astronomy receivers would make an interesting book in 
itself. For much of this history, receiver stability was a major engineering challenge. 
Clever differencing techniques, such as beam switching, load switching and self- 
correlation receivers were invented to circumvent the stability problems inherent 
to single-dish telescopes. These included such brute force techniques as rocking 
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relatively large antenna sub-reflectors to move the telescope beam on and off radio 
sources faster than the receiver gain varied. Much of this has been supplanted by a 
combination of more stable solid-state amplifier electronics and digital signal pro- 
cessing. Straightforward “total power” receivers are becoming the norm in a modern 
single-dish telescope. Interferometric telescopes have always had the advantage of 
stabilizing techniques, such as phase switching, which will be covered in another 
chapter of this book.* 

There are close parallels between analog and digital receiver operations, and 
this chapter will draw on those parallels wherever possible, beginning with the 
conversion from analog to digital representation of a radio signal. In fact, a receiver 
can be completely implemented in digital electronics or even in code running on 
garden-variety personal computers as long as the input frequency is not too high or 
the signals need not be processed in real time. One significant difference between 
analog and digital signal processing is that the former operates on mathematically 
real quantities, while the latter represents signals as complex quantities. This enables 
a much richer signal processing toolbox. 

This chapter will also assume that the random noise signals from a radio tele- 
scope can be represented by the sum of many sinusoidal signals at closely spaced 
frequencies. The latter point can be illustrated by plotting the sum of many closely 
spaced sinusoids. An operation on one sinusoidal component of a noisy signal can 
be assumed to be valid for all other components and for their sum as long as the 
electronics operates in its linear range (output voltage is a simple multiple of the 
input voltage). 


2. Amplification 


Before radio astronomy signals can be processed by either analog or digital receiver 
components they must be amplified to a useful voltage level. All signals from the 
cosmos, from the environment of a radio telescope, and from radio electronics are 
essentially random noise whose noise power can be described by the equation 


P=kTB, (1) 


where k is the Boltzmann constant, 1.38 x 10-2? m? kg s~? K—!, T is temperature 
in Kelvin, and B is bandwidth in Hertz. The equivalent noise temperature at the 
receiver input of a modern radio telescope at centimeter wavelengths is about 20K. 
This translates to 2.76 x 10-4 W in a 100 MHz bandwidth. The industry standard 
equivalent impedance of the input and output ports of radio components is 50 Q 
so, from Ohm’s Power Law (V = VPR), this power translates to a root-mean- 
square (rms) voltage of 1.17 x 10~® volts (1.17uV). A typical digitizer requires 
an rms input voltage level of about 30 millivolts, so the net receiver voltage gain 


*See Chapter 2 of this volume. 
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must be about 3 x 107?/1.17 x 107® = 2.56 x 104, or about 88 decibels (dB). 
Passive analog components, such as filters, frequency converters (mixers), and cables 
will have insertion losses, so a total amplifier gain exceeding 110 dB is common. 
This gain must be stable to a part in 10* to reliably measure astronomical signals 
over the time required to make a complete measurement (typically minutes). An 
interesting consequence of the availability of ever faster digitizers and, hence, wider 
processed bandwidths, is that the total gain required before digitization is decreasing 
proportionately. 


2.1. Low Noise Amplifiers 


The foremost performance specification for all radio astronomy instrumentation is 
sensitivity. At the receiver level, this is determined almost exclusively by the input- 
referred equivalent noise power that is added by the electronics, and by the stability 
of the complex gain. The former determines the instantaneous root-mean-square 
(rms) error with which an input voltage may be detected, while the latter limits 
how well this measurement uncertainty may be reduced by averaging. The amount 
of averaging, or integration time, required to reach a given level of uncertainty is 
also dependent on the resolution bandwidth of the measurement, but this is often 
constrained by external factors, such as the presence of interferers or the type of 
observation (e.g. spectral line vs. continuum). 

As described above, a typical radio astronomy receiver will need approximately 
90 dB of net gain between the feed point of the antenna and the input of the 
digitizer(s). Because the noise power added by any given element in the signal path 
is amplified by all subsequent gain stages, it is important that the first gain stage 
be placed as close to the input as it can be, and that this stage add the least 
possible amount of noise. Further, since amplitude and phase stability is essentially 
cumulative, it is important to achieve the required gain using the minimum number 
of stages. This makes the low noise amplifier (LNA) one of the most aggressively 
optimized components in all of radio astronomy instrumentation. 


2.1.1. Characterization of Noise 


It is customary in the field of radio astronomy instrumentation to express the noise 
power emitted by a device or two-port network in terms of its equivalent noise 
temperature, Tg. Conceptually, this is the physical temperature that a matched 
termination must have in order to deliver the same noise power per unit bandwidth 
at the output, if the network itself were noiseless. It derives from the classical 
Rayleigh—Jeans approximation! of Planck’s law for the radiation from an ideal black 
body in thermal equilibrium, which was given in Eq. (1). (A closely-related quantity 
is the noise measure, which is simply the equivalent input-referred noise temperature 
that would result from a cascade of an infinite number of identical devices, taking 
into account the device’s gain.) Generally incoherent with other sources of noise 
in the instrument, this power may be added directly to contributions from other 
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components, passive losses, antenna spillover, atmospheric noise, and to the noise 
power spectrum of the target under observation itself (for that too, is a stochastic 
signal in this application), taking into account the cumulative system gain up to the 
point at which the total power is measured. For the purposes of most calculations, 
it is sufficient to assume that all such noise sources have a Gaussian probability 
distribution and a white power spectral density, at least over small bandwidths. 

The physical source of electronic noise varies from one type of device to another, 
but several canonical representations apply so long as the behavior of the device 
may be treated as linear (which is almost always true of both passive devices and 
low-noise amplifiers). These representations take the form of otherwise noiseless 
equivalent networks with noise sources attached to the ports. The noise sources in 
the model may be random voltage or current generators, forward- and backward- 
traveling noise waves, or combinations thereof. The common thread between them 
is that the number of distinct noise sources is equal to the number of ports in the 
network, and that they are, in general, partially correlated. Thus, for a two-port 
network such as a low-noise amplifier, the noise behavior at any given frequency 
is completely described by four numbers, called noise parameters; for example, the 
rms amplitudes of the forward- and backward-traveling noise waves along with the 
(complex) correlation coefficient between them. 

Noise matching for a two-port network is the process of providing an optimal 
input termination that reflects the backward-traveling noise wave back through the 
device with the proper amplitude and phase to minimize the noise delivered at the 
output via cancellation with the correlated portion of the forward-traveling noise. 
The residual backward-traveling noise is typically ignored in low-noise amplifier 
design for radio astronomy, as it is harmlessly transmitted back out into space. 
However, it is becoming of greater importance now with the advent of phased-array 
feeds where there exists substantial mutual coupling between the closely-packed 
antenna elements. It is also the basis for the operation of active-cold loads. 

A far more common and useful set of noise parameters is represented in the fol- 
lowing equation for the equivalent noise temperature of a network when terminated 
by the characteristic impedance, Zo, 


IES él” 
eq = Tin 1+ M—,; ’ (2a) 
Be [Tope 
Zo t Zo 
Tot = => 2b 
pt Zopi + Zo ( ) 


where Typin is the minimum achievable equivalent noise temperature, Zopt is the 
source impedance that would achieve it, [opt is the (complex) reflection coefficient 
of that source impedance in a system with characteristic impedance Zp, and M is 
a unitless parameter which describes how rapidly the noise temperature degrades 
with sub-optimal source impedance. 
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Note that most authors use either the Lange noise parameter,? N = 
MT min/4To, or the so-called noise resistance, R, = N/Gopt, as the fourth noise 
parameter, but in many ways MW is most fundamental to the device as its definition 
does not depend on TJo(= 290K), an arbitrary constant which need not have any 
meaningful relationship to the environment in which the device is actually used. 
Also, like both N and Tiin, M is invariant under lossless, reciprocal embedding at 
both input and output, and is similarly unchanged by parallel connection of any 
number of identical devices (an important property with regard to device scaling).? 
Topt, on the other hand, may be transformed by external matching networks to be 
made arbitrarily close to zero at a single frequency. 

We observe that M is theoretically subject to the following constraints: 


1<M<2. (3) 


The lower limit of the expression is a result of the fact that the correlation between 
noise sources cannot be larger than 100%. The upper limit is not strictly required 
of an arbitrary two-port network, but it has been proven for a broad class of net- 
works which includes the most widely used models for both field-effect and bipolar 
transistors, and it has been verified empirically for numerous real-world amplifiers.4 


2.1.2. Device Models 


For many years, the best performing device and the workhorse for low-noise ampli- 
fiers in radio astronomy has been the Indium Phosphide Heterostructure Field-Effect 
Transistor (InP HFET) operated at cryogenic temperatures, typically 15-20 K. Also 
called a High Electron Mobility Transistor (HEMT), this device achieves high mobil- 
ity by confining the charge carriers to a narrow channel between source and drain 
in a two-dimensional electron gas (2DEG) adjacent to the dopant layer where they 
may flow unimpeded by the impurities. 

The most commonly-used device model for HFET’s in radio astronomy is the 
Pospieszalski noise model,° in which equivalent noise temperatures, Teate and Tarain, 
are assigned to the intrinsic gate-source and drain-source resistors, respectively, 
while all extrinsic resistors are assumed to be at ambient temperature, and using 
the thermal noise relations (v2) = 4kTR (V/WHz) and (i?) = 4kT/R (A/VHz). In 
practice, it seems sufficient to assign ambient temperature to Tyate as well, leaving 
Tarain aS a non-physical free-parameter that at present can only be obtained by 
measurement. The precise physical mechanism that gives rise to Tayain is not fully 
understood, but in some way it accounts for the quasi-random motions of charge 
carriers within the 2DEG that exists in the channel. Ty;ain varies with both ambient 
temperature and bias point. 

Repeatability of the cryogenic noise performance of HFETs, even among devices 
from the same wafer, has been an enduring challenge for developers of radio astron- 
omy instrumentation. The best performing devices are those that have low gate 
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leakage and maintain high transconductance at low quiescent current. These prop- 
erties, summarized as the “quality” of the pinchoff characteristics, can be measured 
at room temperature and are therefore a critical diagnostic tool for predicting which 
devices will offer the best performance when cooled. 

Two approaches to improving the noise performance of InP HFETs have been 
pursued in recent years. The first is aggressive scaling of gate length to achieve 
higher transconductance. While enabling the devices to operate at much higher 
frequencies,° the resultant increase in transconductance has been partially offset in 
practice by a concurrent increase in Ty;ain, yielding only marginal improvement in 
noise temperature at more modest frequencies (less than approximately 100 GHz).7 
Most of the gains, therefore, occur at sub-mm-wave frequencies where the device 
is starved for gain — a region where, coincidentally, their noise performance is 
superseded by superconducting mixers. The latter technology is thus still preferred 
for ground-based radio astronomy, but the high frequency short gate-length InP 
HFETs are an attractive alternative for space-borne radiometers, where supercon- 
ducting temperatures are difficult to achieve reliably. 

The second approach to improving HFETs is to optimize directly for their 
operation at low current. These efforts have yielded devices which, at least in the 
cm-wave range, match the performance of the best InP HFETs ever made, while 
doing so with a longer gate length that lends itself well to a more repeatable process.® 

In the lower cm-wave range, another technology has recently been demonstrated 
to have superior cryogenic performance to even InP HFETs, namely cooled Silicon 
Germanium (SiGe) Heterostructure Bipolar Transistors (HBTs). The key to their 
performance is the dramatic increase in transconductance and current gain that 
occurs when cooled to cryogenic temperatures.? This enables them to provide useful 
gain at low operating current, thus minimizing shot noise in the bipolar junctions. 
The typical noise model for a SiGe HBT comprises thermal noise associated with 
the base resistance, and shot noise in the base and collector given by (i?) = 2qI 
where I is the quiescent base or collector current, respectively. 

It was noted in the previous discussion about HFETs that the critical parameter 
for determining noise, Tyrain, can only be found by cryogenic noise measurements, 
which are difficult to perform and accurately calibrate. In contrast, all the key 
parameters of the HBT noise model may be obtained by comparatively simple DC 
measurements. 

As a Silicon technology compatible with industry-standard BiCMOS processing, 
SiGe HBTs offer yet another fringe benefit to their application in radio astronomy, 
namely the ability to combine them monolithically with other analog and digital 
circuitry in a large-scale integrated circuit. 


2.1.3. MIC vs. MMIC 


The vast majority of cryogenic low-noise amplifiers in use for radio astronomy 
today, whether on the ground or in space, are Microwave Integrated Circuits (MIC), 


Radio Receivers and Signal Processing 113 


25 mm 


Fig. 1. Example of a microwave integrated circuit (MIC) or “chip-and-wire” low-noise amplifier. 


otherwise known as “chip and wire” construction (e.g. Fig. 1). Discrete transistor die 
are mounted in a dedicated housing along with peripheral components such as single- 
layer chip capacitors, thin-film resistors, and Teflon-based microwave laminate sub- 
strates, connected to one another by gold wire bonds. This allows the individual 
active devices to be screened prior to installation and the very best performers 
selected for the first stages where noise characteristics are most important. This 
is especially valuable when considering the challenges alluded to earlier regarding 
the repeatability of cryogenic noise performance from one wafer to the next with 
most InP fabrication processes. A single wafer, comprising tens of thousands of 
transistors, if the performance is good, can supply the needs of radio astronomy for 
many years. 

The construction of chip-and-wire amplifiers to the exacting performance stan- 
dards of radio astronomy, however, is a time-consuming and expensive process, 
requiring extensive training and experience that is difficult to acquire. As the size 
and complexity of radio astronomy instruments and facilities continues to grow, 
the drawbacks of this process are becoming prohibitive. A more mass-producible 
approach to LNA fabrication is needed. Monolithic Microwave Integrated Circuits 
(MMICs) provide the solution (see Fig. 2), wherein thousands of fully-functional 
amplifiers are constructed on-wafer simultaneously, untouched by human hands, 
using e-beam and photolithographic techniques. The success of this approach will 
depend greatly on the ability of the semiconductor process to provide repeatedly 
optimal cryogenic performance on wafer after wafer during chip development and 
production. Great strides are now being made in this area,® fortuitously just as the 
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Fig. 2. Example of a Monolithic Microwave Integrated Circuit (MMIC) low-noise amplifier. 
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Fig. 3. Noise temperature as a function of frequency for some of the best cryogenic low-noise 
amplifier results achieved to date. 


supply of the best-performing discrete devices from decades-old wafers are becoming 
depleted. 

Even with repeatable transistor devices, however, the quality of on-chip match- 
ing components (inductors, transmission lines, etc.) is not always sufficient for the 
best performance, at least at low frequencies where passive losses are not insignif- 
icant compared to the noise of the active device. In these case, an off-chip input 
matching network and transition to the feed structure may be employed to realize 
the best sensitivity. 

The best currently-achievable noise measure for cryogenic amplifiers” is a strong 
function of frequency, as shown in Fig. 3. Generally, these are cooled to 15 K ambient 


Radio Receivers and Signal Processing 115 


temperature, achievable with two-stage refrigerators. Although they can and have 
been used at colder temperatures where the application calls for it, experiments 
have shown that not much further improvement in device characteristics can be 
expected below 15-20 K. 

Over the range where transistor amplifiers are competitive for ground-based 
astronomy, including the cm-wave band and the lower portion of mm-waves, the 
trend may be approximated as roughly five times the quantum noise limit (that is, 
Tmin © 5hv/k). The upward-curvature becomes more severe in the mm-wave range, 
owing not only to the increasing minimum noise temperature of the active device, 
but also to its declining available gain, making the contributions of secondary stages 
more important. The efforts to scale-down gate length in InP HFETs is helping to 
slow-down this upward trend at mm-wave frequencies, while low-current optimiza- 
tion and the introduction of cooled SiGe HBTs to radio astronomy are addressing 
the issues of repeatability and monolithic integration at longer wavelengths. 


3. Analog-to-Digital Conversion 


To allow parallels to be drawn between analog and digital signal processing in the 
remainder of this chapter, the first component to be described is the analog-to- 
digital converter (ADC). An analog radio signal is continuous in both time and 
amplitude, but a digital signal is discrete in both quantities. This introduces quan- 
tization error in amplitude, and, more importantly, represents the signal only at 
discrete points in time. The discontinuous time sampling introduces an ambiguity 
that is illustrated in the top plot of Fig. 4. If the sampled signal has a frequency 
that is higher than half of the sampling frequency, it will be mistaken for a lower 
frequency signal that is represented by the same sample sequence. This ambiguity is 
called “aliasing”, which can be represented in the frequency domain by the bottom 
plot of Fig. 4. One of the indispensable analog components of any receiver is an 
“anti-aliasing” filter that limits the frequency range of the sampled signal to one 
and only one of the alias zones. The lowest frequency alias zone places the least 
demand on the input bandwidth of the ADC, so this is the zone most frequently 
used. 

The amplitude resolution of an ADC is of less importance than one might 
imagine. Unlike high-fidelity audio sampling, there is relatively little information in 
the amplitude of noise-like radio astronomy signals. In fact, early ADCs in radio 
astronomy used only one digital bit to record whether the analog signal was either 
positive or negative at each time sample. This one-bit sampling lost all amplitude 
information of the noise signal, but it retained most of the spectral information. 
Thompson et al.!° show that the relative sensitivity for samplers using 2, 3, 4 and 
8 levels on noise-dominated signals is 0.636, 0.810, 0.881 and 0.962, respectively. 
A 4-bit sampler (16 amplitude resolution levels) is only 1.2% less sensitive than an 
ideal sampler. 
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Fig. 4. Frequency aliasing. Two frequencies produce equivalent samples. In the top plot, the two 
sine waves have the same frequency offset on either side of half the sample frequency of two mega- 
samples per second. The bottom plot shows the two sine waves in the frequency domain along 
with an illustration of the first four alias zones of the sample clock. 


In the early days of digital signal processing, one-bit and two-bit sampling had 
a big economic advantage in the spectrometers of the time, but there was little 
commercial interest in these specialized instruments. The commercial market for 
8-bit (256-level) samplers has driven their cost down enormously, and the demand 
for ever higher sampling speeds in many scientific and engineering fields continues 
to increase, so amplitude resolution in ADCs is no longer a major issue. 

An ADC typically consists of a sample-and-hold device that maintains the input 
voltage at a fixed level while its amplitude is being converted to digital form, a volt- 
age divider network that supplies accurate reference voltages at all of the possible 
digitization levels, comparators that determine whether the measured voltage is 
above or below each reference voltage level, and a digital logic-gate network that 
combines the comparator outputs into a binary word!! as shown in Fig. 5 for a 
2-bit ADC. The resistor network defines the voltage levels that are the boundaries 
between the integer output values. When the signal being sampled has a higher volt- 
age than the reference voltage at one of the comparator inputs, that comparator’s 
output is a logic “0”, otherwise it is a logic “1”. The truth table for the different 
sampled signal levels is shown in Fig. 5. BO and B1 are the bits in a binary word, 
e.g. “10” and “11” equal decimal 2 and 3, respectively. An 8-bit ADC would have 
256 reference voltage levels and comparators and a more complex digital encoder 
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Fig. 5. Logic diagram and truth table for a 2-bit ADC. 


network to generate the 8-bit output word. In addition to the components shown 
in Fig. 5, the ADC would have a clock and capture circuit on the digital output 
to record a continuous series of sample values equally spaced in time. The circuit 
could also add a sample-and-hold component on the input signal to assure that the 
sampled voltage is stable while the output bits are captured. 


4. Mixers 


Receiver designers do not want to be confined to using one of the sampling alias 
zones, nor should they be required to use ADCs with very wide input bandwidth to 
sample a relatively small fraction of the bandwidth at very high frequencies. Hence, 
there is a need for another component, called a mixer, to convert high frequency 
signals to frequencies that are readily sampled by an ADC. A mixer takes two 
signals, the one to be converted to a lower frequency, fin, and a local oscillator 
(LO), at frequency, fio, to produce an intermediate frequence (IF), fir, that is the 
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difference of the two input frequencies. 
f fin — fio; fin Hos 
if = 
fro — fin, fin < fio. 


The first and second parts of Eq. (4) correspond to the upper and lower sidebands 
of the mixing operation. Any nonlinear (current not proportional to voltage) device 


(4) 


can serve as a mixer. One of the primary requirements of most components in 
a receiver system is that they be as linear as possible so that unintended mixing 
products are not produced where they could be confused with the signals of interest. 
The mixer’s nonlinearity will also produce harmonics of the input and LO signals as 
well as a variety of third and higher-order products. This can result in a challenging 
engineering task to avoid spurious signals in the receiver output, particularly when 
the frequency tuning range is wide. 

An efficient and commonly used mixer is the double-balanced mixer, shown 
in Fig. 6, which has the useful property that the input, LO and output ports are 
reasonably well isolated from one another. The LO signal is strong enough to turn on 
two of the four diodes at a time, which causes the radio frequence (RF) input signal 
to be alternately reversed and not reversed in phase at the IF output port. Figure 7 
shows the resulting time-and frequency-domain representation of these signals. 

At least one frequency filter is required to select whether the upper or lower 
“sideband” (fin > fio or fin < fLo) is converted to lower frequencies. Another 
frequency filter is usually added to the output of the mixer to suppress spurious 
frequency components, such as LO harmonics and other products generated by the 
nonlinear mixer components, and to sharpen the frequency bandpass presented to 
the digitizer. The combination is shown in Fig. 8. 

If the fractional frequency tuning range of the receiver is large, the image selec- 
tion filter must be wide, and the first intermediate frequency must be high enough 
to avoid having the image frequency appear in the first filter passband. This usu- 
ally requires more than one mixer stage and, hence, more than one intermediate 
frequency, depending on the required tuning range of the receiver and how far the 
signals must be transferred from the focal point of one or more antennas to a control 
building for further signal processing. Each mixer stage presents a new challenge 
for suppressing products and adds to the cost and complexity of the system. 


LO Input RF Input 


IF Output 


Fig. 6. Schematic of a double-balanced mixer. 
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Fig. 7. Time and frequency domain of double balanced mixer signals. The top panel shows the 
switching waveform of the LO as a square wave, while the RF is a much smaller sinusoidal signal. 
In the bottom panel, the LO signal and its harmonics are shown in black, the RF signal is in red, 
and the mixing products are in blue. 
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Fig. 8. Filter-mixer combination to select intended image and intermediate frequencies from the 
product frequencies shown in Fig. 7 


4.1. Sideband-Separating Mixer 


An alternative to using multiple mixer stages and a frequency filter before each 
mixer to suppress unwanted sidebands is a sideband-separating mixer. This mixer 
type has been used at millimeter wavelengths, where low-noise preamplifiers and 
filters for suppressing the unwanted sideband have offered little or no perfor- 


mance advantage. The schematic design of a sideband-separating mixer is shown in 
Fig. 9(a). 
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Fig. 9. (a) Analog sideband-separating mixer. (b) Digital signal processing sideband-separating 
mixer. 


In this design, the relative phase of the IF output signals (called I and Q) from 
the two mixers is equal to the relative phase difference of the LO signals applied to 
the two mixers, except that the phase offset is positive for the upper sideband and 
negative for the lower sideband. If this offset is 90°, and another 90° phase shift is 
added to one of the IF signals and these IF signals are added together, the upper 
sideband will be canceled and the lower sideband will not. If the IF phase shift is 
minus 90°, the lower sideband will be canceled, and the upper sideband will not 
cancel. Using analog components to produce the 90° phase shift at all frequencies in 
the IF bandpass is necessarily an approximation, and sideband suppression is limited 
to about 20dB. This is sufficient for many mm-wave applications where the objective 
is to suppress the white noise from the unwanted sideband, but it is inadequate 
at centimeter wavelengths where the man-made interference environment can be 
severe. 

Recent work!?:'3 has improved the sideband separation performance of this 
mixer type by employing the inherent accuracy of digital signal processing of the 
IF signals before combining them to form upper and lower sideband. In the imple- 
mentation shown in Fig. 9(b) the sampled IF signals are Fourier transformed to 
produce separate spectra from the two mixer outputs, and the relative phase of each 
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spectral channel is corrected for both LO and IF phase errors before combining the 
two mixer outputs. These errors can be routinely calibrated by injecting a test sig- 
nal anywhere before the RF power divider and measuring the phase difference and 
amplitude ratio of the two copies of this test signal that appear in the IF spectra. 
This can be done with a broadband noise test source, but a more efficient and 
accurate method is to use a narrow-band CW (continuous wave) test source, step 
its frequency across the receiver passband, and interpolate the calibrated phase and 
amplitude offsets between the calibration frequencies. Sideband separation of 50dB 
or better has been demonstrated at both centimeter and millimeter wavelengths. 
One minor disadvantage from the astronomer’s point of view is that the RF power 
at the DC channel frequency of the IF spectra can be difficult to pass through the 
IF electronics so there may be a small hole in the center of the full double-sideband 
spectrum. 

To show how the calibration process works, consider a CW test signal at the 
input of the RF power divider in Fig. 9(b), which can be described in complex 
notation as 


Ve? = V(cos(wt) +isin(wt)), (5) 


where V is the test signal peak amplitude, i = /—1,w is the angular frequency, and 
t is time. If we take the amplitude and phase of the CW signal in the right-hand 
channel in Fig. 9(b) as references, then the test signal in the left channel in Fig. 9(b) 
is given by 


Xeaeilw-wuoltterot ys) = Xj Or orn, (6) 


Where X_a1 is the ratio of amplitudes, Vi /V2, at the outputs of the mixers; wyo is 
the local oscillator angular frequency; Yio is the phase difference between the local 
oscillator signal at the two mixers (nominally 90°); and yy is the relative phase 
difference of the signal between the two signal paths from the RF input to the 
inputs of the ADCs, due to differences in the delays through cable, amplifiers and 
filters. The minus sign on y_o corresponds to a signal in the mixer upper sideband, 
and the plus sign is for the lower sideband. 

The ADCs and FFT logic in Fig. 9(b) are synchronized with a common clock. 
To cancel the upper sideband (USB, w > wio) and produce the lower sideband 
(LSB), each spectral frequency complex value in channel 1 is multiplied by 


: ec iler-¥Lo—™) — _~ cos(yF — ¥Lo — ™) +4 sin(ys = Gin) (7) 
és Xcal ; 


and added to the corresponding spectral value in channel 2, 
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If Yio is not exactly 90°, or the gain ratio multiplier, X.a1, is not accurate, there 
will be some gain or loss in the lower sideband because its signal through the two 
channels will not add exactly in phase and amplitude, and the upper sideband will 
not be completely canceled. 

Similarly, the upper sideband can be produced and the lower sideband cancelled 
by reversing the sign on the LO phase term, yyo, in the channel 1 multiplier. 


1 e-ilvst+yuo—m) _ > cos(yr + YLo — 7) +isin(ys + YLo — T) (9) 
Xeal Xeal 


The calibration task is, then, to determine the gain ratio, X¢a1, and the phase 
terms, (yy — Yio) and (yr + YLo), for all frequencies in the IF passband and 
for the range of possible local oscillator frequencies. This uses the CW test signal 
injected anywhere in the receiver system before the power divider at the input to the 
sideband-separating mixer. The measured quantities are the time-averaged power, 
P, in the spectral channel containing the CW signal minus the noise power in the 
adjacent spectral channels in each of the two mixer channels shown in Fig. 9‘b). 
The voltage gain ratio is the square root of the quotient of these quantities, where 
the subscripts refer to the two mixer channels: 


X(w) _ (Vi ettVie~@?) = Py (w) CW-+tnhoise = P, (w) noise (10) 
(VaetVae—*) P, (w) CW-+noise a Py (w) noise 


The outputs of the FFT engines in Fig. 9(b) are complex spectra. In other words, 
each spectral channel value is a complex number with real and imaginary terms. The 


phase term, (y+ rio), can be determined from the time-averaged cross-product of 
V, and V2. When V,; and V2 are expressed with their time-dependent phase terms, 
they are 


Y= Xe— writ estero) | (11a) 
bse, (11b) 
YUVe = Xe ilertero) | (11c) 


And the sum and difference of the relative mixer channel and LO phase offsets may 
be derived from the time-average of this cross-product as 


ime) s 


+ — t 
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Since the plus sign corresponds to the lower sideband (w < wo), and the minus 
sign to the upper sideband (w > wro) we can define 


PLSB = YF + PLO; (13a) 
USB = YF — YLO- (13b) 


These two quantities can be measured at each intermediate frequency with a CW 
signal in the upper sideband (w = wio + wir) and then in the lower sideband 
(w = WLo — wir). If it can be assumed that the phase offset between the channels, 
in the signal paths between the signal power divider and the mixers, is small, the 
LO phase and signal delay offsets can be separated for system analysis purposes 
with 


YLSB — YUSB 


PLO = ——_ (14a) 
LSB + YUSB 
o - ees (14b) 


In general, one cannot expect the pre-mixer phase offset between the two channels 
to be constant over a wide frequency range. This assumption is not necessary for 
accurate sideband separation, but it can be useful for system diagnostics. 

For the purposes of digital signal processing, we define the following measured 
quantities: the channel 1 complex voltage in one FFT spectral point is Vj = A+iB; 
the channel 2 complex voltage in the same FFT spectral point is V3 = C+7D. The 
vector to pass the upper sideband and cancel the lower sideband is then 


= cos(Yyisp) 4 ;Sin(¢iss) 


R ily = 
u +tlu xX xX ) 


(15) 
and the vector to pass the lower sideband and cancel the upper sideband is 


= cos(yusp) + ;Sin(vuss) (16) 


Rr, +i, = X X 


Then the upper sideband signal voltage, without the time-dependent term, is 
Vusp = (A+7B)(Ru + ily) +C+idD 
= ARy — Bly + C+i(Aly + BRU + D) (17) 
and the lower sideband signal will be 
VisB = (A + iB)(Rr + iI) +C+iD 
= AR, — BI, +C-4 i(Al, + BRy D). (18) 


All of the terms in the equations above are frequency dependent within the FFT 
spectra and must be calibrated accordingly. The powers in a given FFT frequency 
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bin are then 


Puss = (ARy — Bly + C)? + (Aly + BRy + D)? (19) 


and 
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5. Reflectionless Filters 


As described earlier, filters are a necessary component in almost any radio sys- 
tem to eliminate spurious mixing products, out-of-band interferers, excess sideband 
noise, and unwanted alias zones, or even just to limit the integrated power incident 
upon a sensitive electronic device, such as a low-noise amplifier. Modern designers 
have at their disposal an extensive arsenal of filter design techniques and effective 
implementation precedents thanks to almost a century of focused study in the field, 
but nearly all electronic filters in common use today have at least one feature in 
common: they block the transmission of unwanted signals by reflecting them back 
to the source. This can lead to a host of subtle undesirable effects that may be 
negligible in many systems but are becoming important in high-performance radio 
astronomy receivers with ever more stringent sensitivity and stability requirements. 
In the development of the precision calibrated, digital sideband-separating receiver 
described in Sec. 4, for example, it was found that one of the dominant contribu- 
tors to calibration instability once all others were eliminated was related to out-of- 
band interactions between the filters and nonlinear components such as the mixer, 
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Fig. 10. (a) Topology and (b) simulated performance of a reflectionless low-pass filter. The 
elements are labeled with normalized values, where g, are the standard Chebyshev prototype 
parameters. 


Radio Receivers and Signal Processing 125 


especially when coupled to broadband amplifiers with residual gain that typically 
extends well beyond the operative frequency range. 

This has led to the development of some novel filtering structures that, in con- 
trast to conventional filter topologies, absorb the stop-band portion of the spectrum 
rather than reflecting it.141!7 An example is shown in Fig. 10. The capacitors pro- 
vide a path for high frequency signals from the ports to be dissipated in the bottom 
two resistors, while the inductors provide a path for low frequency signals to pass 
through. This and similar structures!” have theoretically infinite return loss at all 
frequencies, and therefore provide a good impedance match to any component in 
the pass-band, stop-band, and transition-bands. 
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Modern dual-polarization receivers allow a radio telescope to characterize the 
full polarization state of incoming interstellar radio waves. Many astronomers 
incorrectly consider a polarimeter to be the “backend” of the telescope. We go to 
lengths to dissuade the reader of this concept: the backend is the least compli- 
cated component of the radio telescope when it comes to measuring polarization. 
The feed, telescope structure, dish surface, coaxial cables, optical fibers, and elec- 
tronics can each alter the polarization state of the received astronomical signal. 
We begin with an overview of polarized radiation, introducing Jones and Stokes 
vectors, and then discuss construction of digitized pseudo-Stokes vectors from 
the outputs of modern correlators. We describe the measurement and calibration 
process for polarization observations and illustrate how instrumental polarization 
can affect a measurement. Finally, we draw attention to the confusion generated 
by various polarization conventions and highlight the need for observers to state 
all adopted conventions when reporting polarization results. 


Introduction 


Astronomy involves the reception of light from objects beyond the Earth. Light 
from these distant objects can arrive at a telescope with its electric field having 
some preferred orientation or rotation. This tendency is known as polarization. 
Most astronomers are happy to just measure the intensity of light from distant 
sources, but radio astronomers can easily measure the full polarization state of the 
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radio waves they collect. Sadly, many astronomers consider polarimetry an esoteric 
specialty that is not worth their effort. The aim of this review is to offer a clear 
description of the fundamentals of measuring polarization in radio astronomy. 

At radio wavelengths, we find a number of processes that can produce polar- 
ized radiation®*: linearly polarized blackbody emission from the solid surfaces of 
planets and moons;? linearly polarized thermal emission from dust grains aligned 
with a magnetic field; synchrotron/cyclotron radiation emitted (or absorbed) 
by relativistic/non-relativistic electrons gyrating around magnetic field lines and 
producing linearly polarized light; Zeeman splitting of spectral lines emitted or 
absorbed in a region threaded by a magnetic field, producing elliptically polar- 
ized light; the Goldreich—Kyalfis effect, producing linear polarization via scattering 
of anisotropic spectral-line radiation by atoms or molecules in a magnetic field; 
Thomson scattering and gravitational waves producing linear polarization in the 
cosmic microwave background. Radio sources that show some signs of polarization 
include our Sun, planets and moons in the solar system, pulsars, gas clouds in the 
interstellar medium, circumstellar disks, masers, synchrotron emission from galax- 
ies, quasars, jets, and the cosmic microwave background. Most of these sources have 
low fractional polarization (pulsars, solid surfaces, cyclotron/synchrotron emission, 
and masers being notable exceptions, with fractional polarizations up to 100%). 

The polarization of a radio wave can be affected as it travels through interstellar 
space. Faraday rotation causes the polarization angle of a linearly polarized wave to 
rotate (by an amount x \”) when the wave traverses an ionized medium threaded 
by a magnetic field having a component aligned with the direction of propagation. 
The Earth’s ionosphere produces Faraday rotation that must be corrected for; this 
is a complicated task for interferometers with intercontinental baselines. 

Radio waves then interact with the antenna — typically a dish of some sort — 
where they are reflected and brought to a focus. At the focus the radio waves in free 
space are coupled to an antenna, known as the feed. The feed probes the electric 
field in an orthogonal basis, typically orthogonal linear polarizations (which we call 
X and Y) or left-hand and right-hand circular polarizations (LCP and RCP). In 
this paper, we will always use the IEEE definition of RCP and LCP (more of this 
in Sec. 6.2), for which a receiver would see the electric vector of incoming radiation 
rotate counterclockwise and clockwise, respectively, with time. 

From this point forward, the signals are amplified and encounter a large num- 
ber of electronic components that change the voltage gain (a complex number; 
Sec. 4.3). In addition, differences in cable length (e.g. from the telescope to the 
backend system) produce a differential phase change that is proportional to fre- 
quency (Sec. 4.6.1), and bandpass filters incur phase delays (Sec. 4.6.2). Finally, the 
voltages are sampled, digitized, correlated, Fourier transformed, and stored (Sec. 3). 


“We highly recommend Ref. 1 for a gentle and clear introduction to the general characteristics of 
polarized light and the physical processes that produce polarized astronomical radiation. 
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In this chapter, we discuss how various components of a single-dish radio tele- 
scope system create instrumental polarization and how one corrects or copes with 
this.» 

There are some very comprehensive reviews of radioastronomical polarimetry in 
the literature;>-’ many of them are highly mathematical, employing elegant repre- 
sentations of polarization and invoking such tricks as Lorentz boosts. The aficionado 
should take the time to understand these papers, and those with a theoretical bent 
will really appreciate them, but the polarization newcomer is likely to be scared 
away. It is our opinion that spectropolarimetrists should be doing more to con- 
vince observers to use this tool rather than obfuscating the methods with complex 
mathematical representations. 

We begin in Sec. 2 by discussing the basic mathematical framework of polar- 
ization and how polarization is described by electric fields and, alternatively, by 
Stokes parameters. In Sec. 3, we discuss how we digitally create the self- and cross- 
products that are necessary for polarization measurement. In Sec. 4 we discuss 
how to create calibrated Stokes parameters from the digitally created products, 
including a thorough accounting of all the processes and components that change 
the polarization state of an incoming astronomical radio wave between the feed and 
the backend. The off-axis polarization response of a telescope is then considered in 
Sec. 5. Finally, in Sec. 6 we emphasize the important and necessary role played by 
polarization conventions — and the unfortunate tendency of astronomers to ignore 
those conventions. 


2. Polarization: The Basics 


2.1. The Description of Polarization by Electric Fields 


The polarization of a radio wave is defined by the motion of its electric field vector 
as a function of time within a plane perpendicular to the direction of propagation. 
That plane is known as the plane of polarization and the general shape that the 
electric field traces with time is an ellipse. We can quantify this polarization ellipse 
in terms of any orthonormal basis in the plane of polarization; in radio astronomy, 
we encounter two — the standard Cartesian linear basis and a basis of circularly 
rotating unit vectors of opposite handedness. 

The electric field vector of a monochromatic light wave travelling along the +2 
direction can be written in terms of both a linear and circular set of orthonormal 
bases: 


E(z,t) = Ege? 27th) — (fees) ei(2mvt—kz) 


x Sc He 1 
= (EgR+ Siero" ™, ( ) 


If one is interested in the details of polarization in interferometers, we refer you to Refs. 3 and 4. 
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where R = (% — i¥)/V2 and L = (& + i¥)/V2 are the unit vectors of IEEE RCP 
and LCP. As seen from an observer somewhere at z > 0 and looking back towards 
the origin, IEEE RCP is seen to rotate counterclockwise with time and IEEE LCP 
clockwise. 

We can write Eg as a Jones vector® in either of the bases: 


Ex Eoxe’?* ER Eore’?® 
: a Ene “e 6 Eoie'* 
At a given position z along the direction of propagation (let us take z = 0 for 


simplicity), the tip of the electric field vector E will trace out an ellipse in time with 
orthogonal components given in the linear basis by 


E,(t) = Eyer), E, (t) _ Eyer ree. (3) 
or in the circular basis by 
Ep(t) = Eypetrt ton), Ex, (t) = Eppei(2mtt oe), (4) 


These components define the previously mentioned polarization ellipse. Many treat- 
ments of polarization ignore the absolute phase (which must not be ignored when 
using an interferometer!) and define the relative phase as Ad = ¢y — oz. 

The major axis of the polarization ellipse will be oriented at an angle x with 
respect to the x-axis (see Fig. 1‘a)) where 


2 Fox EHoy COs (by — bz) 


=tan(én—$1); 0° <x<180°. (5) 
Eon ae ii 


tan2y = 


2.2. The Description of Polarization by Stokes Parameters 


Astronomical radio signals are, in general, partially polarized. The polarization 
ellipse and Jones matrices cannot help us quantify partially polarized radiation. For 
this, we use the Stokes parameters. The Stokes parameters are most often denoted 
as I, Q, U, and V in astronomical measurements and, because they are conveniently 
manipulated by matrix algebra, are often written as the Stokes vector.° 


So I 

_ {Si} _ | @ 

s=|3)=|5). (6) 
S3 V 


©While matrices are often represented by a bold font, here we have introduced the notation A 
to represent a 1 x 4 column matrix — known as a vector in the parlance of linear algebra — 
to differentiate from a physical vector A, e.g. the electric field. (The Stokes vector comprises 
the Stokes parameters, which do not represent an orthonormal basis: Stokes IJ can be a linear 
combination of Stokes Q, U, and V.) We later use the notation A to represent a square 4 x 4 
matrix. 
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Fig. 1. (a) The polarization ellipse. For a radio wave traveling along the z-axis (out of the page), 
the electric field will trace out an ellipse in the xy-plane with time at a given position z. The azimuth 
of the major axis of the ellipse relative to the z-axis, x, is known as the polarization angle. [AU 
convention (see Sec. 6.1) aligns the x-axis toward north on the sky. (b)—(c) Representations of the 
sign for Stokes Q and U, respectively, given the polarization angle of the major axis of the ellipse. 
(d) Representations of the sign of Stokes V using IEEE and IAU conventions (see Sec. 6.2). 


where the Stokes parameters are defined®:!° in terms of the intensities of orthogonal 
polarization forms (Joe, I90°), (L445°, J-a5¢), and (Ircp, Incr): 


(1) Stokes I is the total intensity. It is the sum of the intensities of any two orthog- 
onal polarization components and does not store any polarization information. 


I = Ttot = Ine + Igoe = L450 + I_ase = Ince + Iuce.4 

(2) Stokes Q is the difference in intensities between horizontal and vertical linearly 
polarized components and is a measure of the tendency of the radio wave to 
prefer the horizontal direction. If Q > 0, there is an excess of polarized radiation 
along the horizontal, while for Q < 0, there is a vertical excess (Fig. 1‘b)). 
Q = Ioe — Igoe. 

(3) Stokes U is the difference in intensities between linearly polarized components 
at +45° and —45° and represents the preference of the light to be aligned at 
+45°, with U < 0 meaning an excess in linear polarization at an angle —45° to 
the horizontal (Fig. 1/c)). U = Iy4450 — I_ape. 

(4) Stokes V is the difference between the intensities of the RCP and LCP compo- 
nents and describes the preference for the light to be RCP. For positive Stokes V, 
there is an excess of RCP over LCP when using the IEEE and IAU conventions 
(see Sec. 6.2, (Fig. 1‘d))). V= IRcp = Iucp. 


It is important to note that these are definitions. Stokes himself!! used the notation 
{A, B,C, D} a century before Chandrasekhar! settled on {I,Q,U,V}, the latter 


4Here we follow Ref. 10 in using each subscripted I to represent intensities of a given polarization 
form. It might appear recursive to then also define the first Stokes parameter as J, but this is just a 
notational convention and the reader might wish to think of Stokes I as always having an implicit 
“tot” subscript to clarify that it represents the total of intensities in any one pair of orthogonal 
polarization states. 
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three letters of which were assigned with no motivation. Given Chandrasekhar’s 
convention, there still remains room for ambiguity and confusion: for example, Q 
could have been defined as 99° — Joe, and V could have been defined as I,.cp —IRcp 
(and often is! See Sec. 6). 

The degree of polarization, or fractional polarization, is the ratio of the intensity 
of the polarized emission to the total intensity: 


Tro JQ? +U24V2 
apt _ VOT, 0<p<l. (7) 


Leot I : 


We can also form fractional linear polarization 


2 U2 
OO G2 ei, (8) 


Plin = Fi ) 


and fractional circular polarization 


Pcir = T’ -1 < Deir < 1. (9) 


When combining (or spatially smoothing) polarized signals, one must combine (or 


smooth) Stokes parameters, not fractional polarizations, linearly polarized intensi- 
ties, or polarization angles.® 


2.3. Stokes Parameters Expressed in Terms of Electric Fields 


We can also write the Stokes parameters in terms of the time-averaged self- and 
cross-products of the electric field components as 


1 = (BeEz) + (Ey) = (EnER) + (BtEr), 
(EyEy) = (EnEz) + (Ene), 

ee iy) =~‘ ((EnEr) ~ (EREL)), 
E,) ~ (EzBy)) = (BnEn) ~ (ExEx), 


where the angle brackets denote a time average of the electric field, and the overbar 
denotes complex conjugation.’ By substituting Eqs. 3 and 4 into Eq. (10), we derive 


10a 
10b 


10c 


(10a) 
(10b) 
(10c) 
(10d) 


©This is necessary because the signal being received is being treated as quasi-monochromatic. Such 
light will not trace out an ellipse with time, but the ellipse can be recovered if the products are 
averaged over a time long relative to the period of the radio wave. Even for a very fast correlator 
that could accumulate only 100 ms of data, there will be millions of wave periods per integration at 
radio frequencies, which is plenty long to uncover the polarization properties of the astronomical 
radiation. 

fTextbooks covering polarization tend to denote complex conjugation as A*. Many authors reverse 
terms in some of the difference equations because they have either used the physics convention for 
Stokes V as IEEE LCP — RCP or they have defined the exponential propagation argument of the 
F field as the negative of the IEEE convention that we have adopted in Eq. (1). Finally, there is 
an understood constant on the RHS of each equation accounting for the conversion of the square 
of the FE field to a temperature or flux density. 
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the more commonly found representation® of the Stokes parameters: 


lla 
11b 
llc 


d (11a) 
Q (11b) 
U = 2(EoxEoy) cos (dy — bx) = 2(EorFox) sin (¢r — ox), (11c) 
Vv (14) 


= —2(Eox Eoy) sin (by — dz) = (Eor) — (Ebr): 


From Eggs. 11 and 5, it can be seen that the angle that the polarization ellipse makes 
with the horizontal (i.e. z-axis) can be expressed by 


y= stan! (3): 0° < x < 180°, (12) 
where y is known as the position angle of linear polarization (or, more succinctly, 
the polarization angle) and has a total range of 180, not 360, degrees. Therefore, y 
has an orientation, not a direction. Line segments are commonly used to represent 
the amplitude and orientation of linear polarization on the plane of the sky. The 
astronomical community regularly refers to such a line segment as a polarization 
vector even though a vector has a direction. We propose the adoption of the term 
segtor. 


3. Measuring Self- and Cross-Products with Digital Methods 


Our dual-polarized receiver system has two orthogonal polarizations, which we 
denote by A and B because the discussion applies, unchanged, whether our feed sys- 
tem is native linear, native circular, or something in between. Having both polariza- 
tions allows us to synthesize all the Stokes parameters from self- and cross-products 
of the two polarizations using the digital equivalent of Eq. (10). 

The time-averaged voltage products are derived from digital samples in one of 
two ways. Historically, the XF correlation technique’ prevailed because of its simpler 
hardware requirements. With XF, one uses a correlation spectrometer, which pro- 
duces time-averaged auto- and cross-correlation functions (ACF's and CCFs, respec- 
tively). These are Fourier transformed, usually in a general-purpose computer, to 
produce power spectra. Each ACF is computed for N positive lags; negative lags are 
unnecessary because autocorrelations are symmetric with respect to lag. The ACFs 
are averaged over time and the Fourier transform (FT) of the resulting average 
ACF gives the self-power spectrum. Because the ACF is symmetric with respect to 
lag, its Fourier transform is real and symmetric with frequency, so the self-product 


Optics, radiation, and astronomy texts usually provide this set of Stokes parameters, and will often 
include their representation as a function of the polarization ellipse parameters. The correlation 
representation of Eq. (10) is not widely presented. 

hThe “X” represents correlation and the “F” represents a Fourier transform. 
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power spectrum has N independent channels. Symbolically, for polarization A we 
write 


AA = FT(ACF(Va)). (13) 


The cross-correlation of the two polarizations is not symmetric with lag, so it 
must be computed both for N positive and N negative lags. Its FT is complex with 
Hermitian symmetry, so the cross-power spectrum can be regarded as consisting 
of a real and imaginary part, each with N independent channels. Symbolically, for 
polarizations A and B we write 


AB = Re {FT (CCF(VaVz))}, 


(14) 
BA =Im{FT (CCF(VaVz))}. 


Thus, for a native-linear feed connected to the inputs of a digital spectrometer in 
such a way that (A,B) = (X,Y), the spectrometer will produce the four spectra 
[XX, YY, XY,YX]. Similarly, for a native-circular feed with (A,B) = (R,L), the 
spectrometer will output [RR, LL, RL, LR}. 

Today, the FX technique is favored because of the heavy computing ability of 
FPGAs and GPUs. With ee each polarization is sampled at rate t, over time 
interval 2T, providing 2N = 2 samples. This block of data is Fourier transformed, 
producing a complex feunistornd of 2N channels with Hermitian symmetry having 
N positive-frequency and N negative-frequency channels. The self-product power 
spectrum is this FT times its complex conjugate, and because of the Hermitian 
symmetry, it is real with the N negative- and positive-frequency portions identical. 
Thus, it is a power spectrum with N independent channels. Similarly, one calcu- 
lates cross-product power spectra by multiplying the Fourier transforms of the two 
polarizations with both possibilities of complex conjugate (Eq. (18)). This produces 
a complex cross-power spectrum having 2N independent channels, split between 
negative and positive frequencies. This cross-power spectrum does not have Hermi- 
tian symmetry, so has a real part and an imaginary part, each with N independent 
channels. Thus, we have four spectra of length N. Symbolically, for the V4 and Vg 
self-product spectra we write 


AA =(FT(Va)FT(Va)), BB = (FT(Vz)FT(Va)). (15) 
The FX spectrometer will return either the complex cross-product spectrum 
(FT(Va)FT(Va)) or (FT(Va)FT (Vz), (16) 


but not both. Since these are a complex conjugate pair, we can symbolically repre- 
sent the real and imaginary parts of these cross-product spectra as 


AB = Re{(FT(Va)FT(Vg))} = Re{(FT(Va)FT(Va))}, 
BA = Im{(FT(Va)FT(Vp))} = —Im{ (FT(Va)FT(Vz))}. 
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(Note that ambiguity exists in the sign of the BA term because it will not be known 
a priori which of the cross-product spectra an FX spectrometer will output; this is 
determined via calibration.) The real-valued Stokes parameter spectra can then be 
assembled from the self- and cross-product spectra following Eq. (10) as 


[(FT(Va)FT(Va)) + (FT(Va)FT(Vp))] = AA + BB, 
[(FT(Va)FT(Va)) — (FT(Va)FT(Vp))] = AA BB, 
[(ET(Va)FT(Vp)) + (FT(Va)ET(Va))] = 2 AB, 
—i (FT(Va)FT (Vp) — (FE(Va)ET(Va))] =2 BA 


Even after these self- and cross-products have been properly amplitude- 
calibrated and combined, they do not provide true Stokes parameters, because the 
telescope circuitry introduces cross-coupling and phase shifts. Thus, they do not 
provide a true Stokes vector as defined in Eqs. (6) and (10). Rather, they pro- 
vide a pseudo-Stokes vector with four pseudo-Stokes parameters. In this review, we 
represent pseudo-Stokes vectors by the special symbol S (the calligraphic S$). 

Incorporating all of this, the pseudo-Stokes vector assembled from the correlator 
output is 


=a AA+BB 
wo: | SE AA~— BB 
= _ a i 29AB . (19) 
oy ag 2BA 


4. The Measurement and Calibration Process 


We have treated everything in our system — from the source’s radiation incident on 
the Earth to the digital backend output — as a black box. To convert the resulting 
pseudo-Stokes vector into a true Stokes vector for the astronomical source being 
observed, we need to undo the effects of this black box. 


4.1. Amplitude Calibration 


The digitally produced pseudo-Stokes vector is generated in terms of arbitrarily 
scaled numbers derived from the correlator input voltages (V4,Vg), which are 
instrumentally generated from the incoming electric fields (E.4, Eg). We must con- 
vert these arbitrary units to physically meaningful units (kelvins or janskys), which 
is done by inserting noise of known intensity using standard radioastronomical tech- 
niques. This is a standard process that is covered in other chapters of this book, so 
in the interest of brevity we will omit further discussion of this topic. Rather, we 
assume at this point that all the pseudo-Stokes vector S°°" elements are properly 
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calibrated with respect to amplitude and are brightness temperatures in units of 
kelvins.' 


4.2. For Illustrative Purposes: A Linearly Polarized Source 


Measuring the polarization of a source means obtaining its four calibrated Stokes 
parameters. Here we focus the discussion for illustrative purposes by considering 
linearly polarized sources. Concentrating on linearly polarized sources is natural, 
because many polarized radioastronomical sources have only very small circular 
polarization; pulsars and masers can be exceptions. 

Purely linearly polarized sources have V = 0 and are characterized by the 
fractional linear polarization perc,jin and the position angle Vsr-; these, in turn, are 
derived from Stokes (Q,U): 


1 
Psrc,lin COS 2X src 


Psrc,lin sin 2X sre 
0 


(20) 


Ssrc = ftsrc* 


We will consider both astronomical sources, which normally have pgre lin < 1, and 
special-purpose test signals, which normally have Peaijin = 1. 

For the purpose of measuring polarization, the receiver system needs a noise 
diode output that is injected into both polarizations as a correlated calibration 
signal (a.k.a. “cal”). This can be accomplished either by injecting it externally — 
e.g., by a linearly polarized vertex radiator — or by splitting the noise diode output 
and using two cables to inject the signal into both polarization paths, each with a 
directional coupler located just in front of its first amplifier. The position angle of 
the vertex radiator should be 45° away from the feed probes with the ideal that 
the injected noise has small, or ideally zero, Q. The cable-and-splitter option is the 
usual case, and it is depicted in the system block diagram in Fig. 2. 

For the case of the cable-and-splitter injected cal, the powers injected into the 
two polarization channels are almost equal, so Stokes Q, which is their difference, 
is small and, ideally, zero; similarly, the total polarized power fraction (Eq. (7)) is 
unity. So for this ideal case, the Stokes cal vector is 


Leal 


Qeal 
Seal = : 21 
Leal cos Ageal ( ) 


Leal sin Adeal 


The angle Adcat = @eal,A — cal,B represents the phase difference between the 
injected noise diode signals. A primary contributor to this difference is the different 


iSee Chapter 1 of this volume. 
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Fig. 2. Block diagram of dual-polarized (A and B) single-dish system (adapted from Ref. 13). The 
noise diode (a.k.a. cal) output is injected through short cables and directional couplers with com- 
bined phase delays ¢4 and ¢g. The total voltage gain of polarization channel A is g4 = 941942; 
the voltage gains are complex, with an amplitude and a phase. The total cable length for channel 
A is length L.,4, which includes the run from the dish to the correlator input so it can be very long 
(more than 1 km at the Green Bank Telescope). The thick lines represent mechanical structures 
or passive electronics that do not change with time; the thin lines represent active electronics and 
other circuitry that do change with time and need calibration. 


lengths of the two noise diode cables, which makes Ag, a linear function of fre- 
quency. If the cable length difference is exactly zero, and if the directional couplers 
and other devices in the circuit are perfectly matched for the two polarizations, then 
this injected cal signal is 100% polarized with Ag@ca) = 0 and Stokes Uca = Ical. 

Consider the two quantities Qc; and Agca). The cal is injected through a power 
splitter, cables, and directional couplers. These are all mechanical devices and should 
be stable over long periods of time; indeed, we have observationally found that to 
be the case. This is fortunate, because we rely on the cal as the secondary standard 
for system calibration. Therefore, it is essential to determine Qa, and Ag@cai, and 
the process of determining them we call Mueller matrix calibration. We discuss this 
process below in Sec. 4.5. 


4.3. Specifying the Stokes Vector Transfer Functions by Mueller 
Matrices 


Figure 2 shows a block diagram of a typical dual-polarized radioastronomical 
receiver. The signal from the source first encounters the feed. The feed rotates 
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with respect to the source: for an alt /az-mounted telescope observing an astronom- 
ical source, it rotates by the parallactic angle, while for an equatorially mounted 
telescope it doesn’t rotate at all. If it is an injected test signal, e.g. from a vertex 
radiator, one intentionally rotates the feed for calibration purposes.) The rotated 
feed converts the incoming electromagnetic radiation to voltages. Finally, these volt- 
ages are amplified to levels appropriate for the input to a digital spectrometer. 

Each of these processes modifies the Stokes parameters. We can regard each 
process as having a transfer function for the four Stokes parameters. This transfer 
function is a 4 x 4 matrix, known as the Mueller matrix. We need the Mueller 
matrices for the above three processes, discussed here in the order in which the 
source radiation encounters them: 


(1) FEED ROTATION. For the rotation of the feed by angle . with respect to 
the source, the Mueller matrix is 


1 0 0 


0 cos2y sin2x (22) 


= 
I 


0 —sin2y cos2x 
0 0 0 


eS Oo co & 


The central 2 x 2 submatrix is, of course, nothing but a rotation matrix.* When 
the telescope rotates with respect to the source, which is the operation described 
by Eq. (22), it is equivalent to keeping the feed stationary and having a purely 
linearly polarized source emitting with position angle (Xsre — x): 


1 
src,lin COS 2 src 
Ssres = My, . Ssre = Tre 7 i . ( Ix x) . (23) 
— Psrc,lin sin(2[Xsre _ x]) 
0 


Note that we can regard both Ssre and Sgre,y as true Stokes vectors in the sense 
of Eq. (6) as long as we specify that each has its own reference coordinate 
system. In terms of the source’s reference system, Sgrc,y is a pseudo-Stokes 
vector because its elements are not [I,Q,U,V]. 

(2) FEED COUPLING. Next comes the feed. Here we consider perfect dual- 
polarized feeds with native-linear or native-circular polarization, where the term 
“perfect” means that the two polarizations are orthogonal, the two polarizations 
are either purely linear or purely circular, and there are no losses. We consider 


JYou might think that rotating the vertex radiator is equivalent to rotating the feed. That is not 
the case! When you rotate the radiator, the transmitted signal changes, and along with it the 
reflections from portions of the telescope, such as feed legs, change. However, when you instead 
rotate the feed, the reflections of the transmitted signal remain unchanged. 

kA reminder that we adopt the notation A to represent a 1 x 4 column matrix and A to represent 
a square 4 x 4 matrix. ~ 
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these extremes for several reasons: (1) many feeds are, in fact, close to perfection; 
(2) the discussion can focus on fundamentals without the fog of excess detail; 
(3) in practice, when you are sitting at the telescope and want to know how well 
things are working, a quick and approximate assessment of the receiver system 
is often adequate. 

The feed’s Mueller matrix must be obtained from its Jones matrix. The 
Jones and Mueller matrices for the general case of imperfect feeds are given by 
Egs. (10)—(11) of Ref. 14. For perfect feeds of arbitrary polarization, the matri- 
ces depend on two angles, called afeeq and Xfeeq. Reference 14 uses tan Qfeea 
to specify the voltage coupling between the input E-field and output voltages 
and Yfeea to represent the phase of that coupling (not to be confused with the 
position angle y used in the current chapter). Perfect native linear feeds have 
Qfeed = 0° and perfect native circular feeds have afeeq = +45° and Vfeeq = +£90°. 
Our two feed types are: 


(a) Native-Linear Feeds. The Mueller matrix for a perfect native-linear feed 
whose probes are aligned with the azimuth and elevation directions is just 
the unitary matrix, ie., 


MFuin — i. (24) 


More generally, if the native-linear feed is mounted at angle yr with respect 
to being aligned, the Mueller matrix is simply 


1 0 0 0 
0 cos2y¥r sin2yr 0 
Mry= 2 
PX 0 —sin2yr cos2yr 0 (2) 
0 0 0 1 
(b) Native-Circular Feeds. The Mueller matrix for a perfect native-circular 
feed is 
1 0 0 0 
0 oO oO 41 
Mr cir = 0 0 1 QO} (26) 
0 Fl O O 


where the signs depend on the values of Qfeeq and Xfeea; the case Afeeg = +45° 
and Xfeeq = +90° has the signs on top (i.e. +1 in the second row and —1 in the 
fourth row). 


AMPLIFICATION AND ELECTRONICS. The Mueller matrix for the 
electronics chains deals with amplitude, so we must define our intensity units. 
First, our uppercase G means power gain (which has no phase), while the low- 
ercase g means voltage gain, which is complex; G = gg. Following Ref. 14, we 
assume that good, but not perfect, intensity calibration has been previously 
applied to the two polarization channels so that the Stokes parameters have the 
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correct units (e.g. temperature), and, in addition, that the total intensity, Stokes 
I, has perfect intensity calibration (to simplify the following equations). Then 
we define (G4,Gg) to be the power gains for the two polarization channels. 
Because of our assumptions we write G4 = (1+ 6G) and Gp = (1 — 6G), 
where 0G is unitless and |SG| < 1. For consistency with Ref. 14, we define 
AG ap = 26G. Then, to first order in AG 4p, the Mueller matrix for the elec- 
tronics chains (see Fig. 2) is 


1 Gan 0 0 
AGap 1 0 0 
Map=| 7 (27) 
0 0 cosA¢dagp —sinAdsp 
0 0 sin Agaps cos Adap 


The two parameters in M4 p are the relative power gain (AG‘4z) and phase 
delay between the two polarization channels (A@4g) and are both associated 
with the electronics and the circuitry, including cable lengths. These quantities 
can change with time because they are associated with active electronics, so 
they need to be measured often enough to keep up with the variability of system 
electronics — and at least once per observing session. 


4.4. The Measured Pseudo-Stokes Vector S°° for Several Cases 


After being operated on by these three Mueller matrices, the original source Stokes 
vector becomes the previously defined pseudo-Stokes vector, producing voltages V4 
and Vg at the input to the correlator. The correlator generates the auto- and cross- 
products as discussed above, producing the pseudo-Stokes vector output S°°". 

When the system looks at “blank sky”, the input noise is mainly from the 
receiver, with a small contribution from the sky and ground pickup. For purposes 
of illustration, we include only the receiver contribution. In this case, the noise is 
injected after the feed so the only Mueller matrix that operates is M,4p. Denote the 
associated pseudo-Stokes vector by SC": 


So = Map: Sex. (28) 


When on the source, with the cal off, we see 


Sore = Map: Mr: My: Sere + SR - (29) 


And when off the source with the cal on: 
cal = Map: Scat + Sry" - (30) 


For an accurate measurement of the source or cal deflection, we must subtract the 
off-source contribution Se", as is usual for all single-dish measurements. Denote 


The Measurement of Polarization in Radio Astronomy 141 


these deflections with the prefix A. Then for any type of feed, the cable-injected cal 
response is 


1 
AG aB As Qeat 
A ASeaL =M Sea = tcal‘ : ee : 31 
LO ee | cos(Agag + Acar) ei 


sin (Adap Tr A deal) 


We have assumed —- <_ 1 and kept only first-order terms. 
Similarly, for ae source deflection, we get 
BFC. 


ASS? = Map: Mp: My - Sexe. (32) 


For a perfect native-linear feed with probes aligned with the azimuth and elevation 
directions, Mr = Mryin = L (Eq. (24)) and 


ia AGap Dsrc,lin COS(2[Xsre — X]) 


AG 
A cor = —_ ; as + Dsre,lin cos(2[Xsre = x]) ; (33) 
—— Psrc,lin COS AdaB sin(2 [Xsre _ x]) 
x] 


Psrc,lin sin AdgaB sin(2 ee — ) 


The AS£e" |,,, 9 element! is the pseudo-Stokes J and is not equal to unity. This can be 


cor 
src,lin,O 


to be unity to eliminate the influence of overall system gain changes. These can 


awkward for Mueller matrix calibration, when one almost always forces A 


occur, for example, from pointing errors or position-dependent telescope surface 
distortions and other circumstances that reduce the overall system gain. So one 
must divide the other three pseudo-Stokes parameters by ASS%');, 9- Fortunately, 
for the common case when an astronomical source is used for the calibration, we 
almost always have Pere,lin < 1; this makes the contribution of the non-unity portion 
of AS Se ino tO the other three pseudo-Stokes parameters second-order, so it can be 
neglected. However, for a locally generated test signal, Peaijin is likely to be unity. 
The easiest way to deal with this is to rescale the amplitudes so that AG‘,p itself 
becomes second order. 
For a perfect native-circular feed, Mp is given by Eq. (26), and 


1 
AG aB 
DOT. ae = Igre ° : ‘ . 34 
=— Psrc,lin sin(Adap Tr 2[Xsre — x]) ( ) 


—Psrc,lin cos(A¢ap as 2[Xsre _ x]) 


sre src,lin 


AS is 0 is the zeroth element of the AS°°",. pseudo-Stokes vector; see Eq. (6). 
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4.5. Discussion: The Process of “Mueller Matrix Calibration” 

Suppose you make a single measurement ASS" of the deflection of a linearly polar- 
ized source and wish to derive the source’s linear polarization fraction Psrclin and 
position angle Vsr- from the measured ASS°". If all of the off-diagonal terms in the 
three Mueller matrices were zero, this would be easy. However, this is never the 
case. If you know the three Mueller matrices, then you can calculate the inverse 


of their matrix product and derive Sz. from the measured ASSOr using Eq. (32); 


alternatively, if you know Psrelin and Ysre (because it is a polarization calibration 
source, for example), then you can analytically calculate ASSY from Sg using 
Eq. (32). Either way, the parameters AGag and Agyg need to be known. To 
determine them we need to use the calibration noise diode, which produces the 
deflection given by Eq. (31). This deflection depends on four quantities: our two 
required amplifier-chain parameters AG4g and Adapg (which change with time), 
and the two cal-injection parameters Qcai and A@ca: (which do not change with 
time). 

We cannot determine AGy4p and Agyg without knowing Q.a and Agca. We 
call the process of determining Qa; and Adega, the Mueller matrix calibration. 
Mueller matrix calibration is done by observing a polarization calibrator with known 
intensity and polarization to obtain ASS" over a range of parallactic angle x and, 

One then plots the y-dependence of 


in addition, obtaining the cal deflection AS‘°r 
the four elements of ASS°". The first element, Stokes J, is constant by definition, 


cal* 
src 


because we always deal with fractional Stokes parameters. The remaining three 
elements vary periodically with y, and from the amplitudes and phases of their 
variation one can use least-squares fitting of Eq. (33) or Eq. (34) to derive all of the 
parameters. 

Least-squares fitting is best for accuracy, but referring to that process does 
not aid our phenomenological understanding. We can develop our understanding by 
solving for the parameters using basic algebra. First, obtain Agdag and Ad, from 


Eqs. (33) and (31]™: 
AS srelin 
Adap = tan! (=e) , 


src,lin,2 


(35) 


ASC 


cal,2 


AS©° 
AéaB = Ageal = tan™+ ( =) . 


Next, plot Se al versus x. The part that varies with y gives Dsrcjin and the 
offset of this cosine wave from zero gives AG 4g. Combine this with A cal, tO 
obtain Q¢aj. One assumes, of course, that during the time interval for this calibration 
the parameters stay fixed — in particular, that the electronics parameters AG‘4p 


and Adyp stay fixed. Experience shows that with modern electronics at the 305-m 


NS SOF is the ith element of the ASS°",. pseudo-Stokes vector; see Eq. (6). 


src,lin,z src,lin 
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Mueller Matrix: 


Mueller Matrix: 


1.0000 -0.0071 0.0106 0.0013 1.0000 0.0003 -0.0007 -0.0001 
0.0071 0.9997 -0.0001 0.0239 0.0003 1.0000 -0.0000 0.0071 
0.0105 -0.0085 0.9346 0.3555 0.0007 -0.0000 1.0000 0.0022 
0.0024 -0.0223 0.3556 0.9344 0.0001 -0.0071 -0.0022 1.0000 


Fig. 3. (Left) Mueller matrix calibration of the native-linear L-band GBT receiver show- 


ing the normalized Oc lin outputs versus parallactic angle y. The crosses (solid line) show 


(ASS (Asse <q); the diamonds (dashed line) show (AS®°,, (ASS ), and the 


src,lin,1 src,lin,O src,lin,2 src,lin,O 


squares (dash-dot line) show (AS? )/(ASsor ). Results of the least-squares fit are given 


src,lin,3 src,lin,O 


below the plot (see text). (Right) The same plot after the 3C 286 data have been corrected by the 
derived Mueller matrix. The same least-squares fit process was performed on the calibrated data; 
the leakage of Stokes parameters has been minimized, as can be seen from the plots and from the 
near-zero off-axis terms in the Mueller matrix derived from these Mueller-matrix-corrected data. 


Arecibo telescope and the 100-m Green Bank Telescope (GBT) this assumption 
is good. 

Figure 3 shows a set of 1666 MHz Mueller matrix calibration data from the 
famous polarization calibrator 3C 286 for the native-linear polarization system at 
the GBT. The crosses (solid line) show (ASS in 1)/(AS Seino) and the diamonds 
(dashed line) show (ASS% 4 9) /(AS Seino) If the data were perfectly calibrated 
for polarization, these two outputs would equal Qsrce and Usre and would vary sinu- 
soidally with twice the parallactic angle, with the two sinusoids having equal ampli- 
tude and no offsets from zero. This is definitely not the case. The squares (dash-dot 
line) in Fig. 3 represent (ASSOC), 3)/(AS Se jin.o) and reveal a major leakage of linear 
polarization into Stokes V. A nonlinear least-squares fit of these data yields the first 
seven parameters listed below the left plot in Fig. 3° The associated Mueller matrix 
is listed at the bottom of the left panel. The non-zero off-axis elements quantify the 


leakage of one uncalibrated Stokes parameter into another. If SS); 1s corrected 


"In Fig. 3, the first two parameters are labeled DELTAG and PSI and correspond to our an and 
cal; the next three deal with feed imperfections; and the last four are the source polarization. For 
a detailed description of all the listed parameters, see Sec. 7.1 of Ref. 14. 
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cor 


by this Mueller matrix, the proper x-dependencies of the elements of So tin 


are 
recovered, as depicted in the right panel of Fig. 3. 


4.6. Two Important Subtleties Regarding Relative Phase dap 
4.6.1. System Cable Lengths 


Various electronics components in the signal path between the feed and the corre- 
lator introduce complex voltage gains that can include amplification, attenuation, 
and phase changes (e.g. some amplifiers introduce a phase shift of 180°). Of par- 
ticular importance: the combined lengths of the coaxial cables and optical fibers 
differ between the two signal paths (L4 and Lg in Fig. 2). Environmental factors 
can cause these lengths to change with time. A difference between the path lengths 
produces a phase difference in radians of 


2n(La — L 
6¢ap = —— (36) 

and this phase difference depends on frequency as 
dodap = 27(La — Lp) (37) 


dv Cc 


This phase difference, 6¢48, adds to other contributions to produce the total phase 
difference Agag. Measured values of the total phase gradient as at- Arecibo 
and the GBT are about 0.3 rad MHz~?, corresponding to a difference in cable/fiber 
length of ~20 m. This is surprisingly large, even considering the extreme distances 
between the feed and correlator for these telescopes. 


4.6.2. System Band-Limiting Filters and Their Induced Kramers—Kronig 
Phase Shifts 


At some point in the receiver chain, one always has a band-limiting filter. Frequency- 
dependent gains automatically introduce phase delays, which can be calculated from 
the Kramers—Kronig relations. If the filters in the two polarizations are not perfectly 
matched, a frequency-dependent phase difference between the two polarization chan- 
nels ensues. This can be particularly serious when the filters have significant gain 
changes within the usable portion of the band. 

The exact formula for the phase shift (in radians) induced by a power gain 
change in an electrical circuit is given by Eq. (2) of Ref. 15°: 


sus =f wy) In ooth ()| du, (38) 


°You would miss a lot if you pass up the opportunity to read Bode’s paper,!® particularly the first 
six pages. Go to http://www.alcatel-lucent.com/bstj/. N.B.: Bode’s derivation treats changes in 
logarithmic attenuation A; since we are treating changes in logarithmic power gain G, we have set 
G = —A in his equations. 
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where G(u) is the filter power gain in nepers, u = In(;—), v is frequency, and 1 is an 
arbitrarily chosen frequency. The weighting function In{coth(#!)] is sharply peaked 
at u = 0 where v = ), so a good approximation eliminates the integral and uses 
only the local derivative (Eq. (22) of Ref. 16): 


mdG(u)) (39) 


Thus a non-flat filter produces phase shifts. 

The left panel of Fig. 4 depicts the power gains and phase delays for Arecibo’s 
interim correlator, for which the baseband low-pass filters (cutoff frequency 6.25 
MHz) are digitally defined and are remarkably flat. We show only the positive- 
frequency half. Phase shifts occur only at the high-frequency end, where the 
responses of the filters drop precipitously. 

In contrast, the GBT Radar Backend filters (Fig. 4, right-hand panel) fall to 
zero gradually, with no sharp cutoff frequency. Thus, the power gain varies rapidly 
within the observing band (top panel), with correspondingly large (huge!) phase 
delays (middle panel), peaking at ~600°! With such large phase delays, even small 
differences between the A and B filters lead to significant frequency-dependent 
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Fig. 4. Filter shapes and their theoretical phase delays for the Arecibo Interim Correlator (left 
panel) and the GBT “Radar Backend” (right panel). Both are low-pass baseband filters with 
complex digital sampling, so the frequency coverage extends from —B to +B, where B is the 
cutoff frequency. In the bottom panel for the GBT, the smoother curve is the measured phase 
difference and the gray noisy curve is the theoretical one from Eq. (39). 
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relative phase delays 6¢4g (bottom panel). For native-linear polarization, these 
delays interchange power between U and V;; for native-circular, they interchange Q 
and U. These phase differences must be corrected. 

For the GBT, the bottom plot shows both the theoretical (the noisy curve, from 
Eq. (39)) and measured (the smoother curve, from correlated noise injection) phase 
differences between the two signal paths. The theory and the data do not agree at 
all. The reason is inaccuracy in the filter shape resulting from uncorrected 4-bit 
quantized voltage sampling. Specifically, the calculated filter responses do not fall 
to zero at high frequencies, as they actually do. If, as a numerical experiment, we 
displace the BB curve downwards by 0.03, the theory curve becomes equal to the 
smoother measured one above 0.85 MHz. Thus, the theory curve is inaccurate and 
noisy, because it is the difference between two large numbers, neither of which is 
itself very accurate. 


5. Off-Axis Instrumental Polarization 


Thus far we have considered the polarization properties of radiation entering the 
feed along the optical axis of a telescope’s main beam. However, radio telescopes 
pick up radiation off-axis via sidelobe response. The polarization state of incoming 
radiation can be altered in these polarized sidelobes (and even inside the main 
beam!) in such a way that unpolarized astronomical radiation can be converted to 
a polarized response affecting the on-axis signal. 

Understanding the mechanisms that create this off-axis instrumental polariza- 
tion is the domain of antenna engineers whose interests lie in building efficient dual- 
polarized communication systems that carry a pure polarized signal (what they call 
the co-polarized signal) in one channel without allowing that information to leak 
into the orthogonal polarization state (what they call a cross-polarized signal).? In 
order to accomplish this, the two E-field polarization states (horizontal and vertical 
for a dual-linear feed, RCP and LCP for a dual-circular feed) need to be perfectly 
orthogonal across the aperture plane of the telescope. This is an impossible task: 
there is always some cross-polarization inherent in the system. We investigate below 
some of the most common causes of this cross-polarization from both the engineer’s 
viewpoint of transmitting from the focus and the astronomer’s reciprocal perspective 
of receiving at the focus. 


5.1. Cross-Polarization Induced by the Feed and Dish Surface: 
Beam Squash 


Reference 18 uses multiple methods to analytically derive the cross-polarization 
response of a circularly symmetric paraboloidal reflector with a feed located at the 


PReference 17 lists the various terms that engineers and astronomers use for the singular concept of 
instrumental polarization, among them: cross-polarization, feed or polarization leakage, D-terms, 
cross-coupling, mutual coupling, cross-talk, barrel distortion, and beam squash. 
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primary focus. The resulting cross-polar pattern depends on the analysis method, 
but two components are always present: a depolarization pattern caused by the 
curvature of the reflector surface and a pattern from the inherent cross-polarization 
of the feed. Both contributions will produce E-field aperture distributions with 
nulls along the principal planes and field maxima in the +45° planes.!®?! The 


far-field E-field radiation pattern can then be produced from this E-field aperture 
19 


distribution via two-dimensional (2D) Fourier transform integration. 

Astronomers are interested in knowing how their telescope responds to an unpo- 
larized source of radiation at any angle off of the optical axis. This can be measured 
in practice for a single-dish telescope by mapping out the Stokes parameter response 
of a strong unpolarized continuum source as the main beam is driven around an 
area centered on the source. Fractional Stokes parameter beam maps are then gen- 
erated by dividing these Stokes beam maps by peak, the peak Stokes I response of 
the main beam.” For notational efficiency, we will refer to the fractional quantities 
{I,Q,U,V}/Ipeak in the remainder of this section as simply {I,Q,U,V}. 

Before inspecting a measured polarized beam pattern, we can investigate what 
one might expect from a perfect telescope. For the last few decades, the commercial 
software package GRASP has developed into a sophisticated tool allowing the far- 
field vector E-field response of reflector antennas to be precisely modeled using 
efficient algorithms for physical optics and the physical theory of diffraction. We 
follow the lead of Ref. 22 and use GRASP to model the transmitted far-field pat- 
tern of the circularly symmetric DRAO 25.9-m diameter paraboloidal telescope 
(f/D = 0.2941) fed from the primary focus by a simulated feed pattern for a 
circular-waveguide feed (with inherent cross-polarization) with four A/4 chokes and 
dual-linear probes.?? The Stokes parameters were constructed from the simulated 
far-field E-field distribution for a given orientation of the feed probes via Eq. (10). 
(To simplify the modeling even further, we exclude any feed-support legs and aper- 
ture blockage.) Then, invoking the principle of reciprocity, the feed was rotated 
through 180° and each of the Stokes patterns averaged over these orientations to 
simulate the transmission of unpolarized light in the far field. Figure 5 shows these 
averaged fractional Stokes parameter beam patterns, which also represent the tele- 
scope’s response to unpolarized radiation. The rightmost panels show the simulated 
beam patterns for a perfect linear dipole feed transmitting onto the same reflec- 
tor geometry; this feed has absolutely no inherent cross-polarization, so that any 


4For a dual-linear feed, the two principal planes are those that contain the reflector axis and the 
orthogonal feed probes. 

‘If one wanted to estimate the instrumental contribution to the on-axis Stokes @ response from 
an unpolarized source in the first sidelobe, one would multiply the source’s Stokes J brightness 
by the fractional polarization at the appropriate location in the Q/Ipeak pattern. These fractional 
Stokes parameter beam maps should not be confused with point-for-point maps of fractional Stokes 
parameters, e.g. the Q pattern divided by the J pattern. While an interferometer might be able 
to measure such a pattern readily, a single-dish telescope does not have enough dynamic range or 
angular resolution to quickly measure point-for-point fractional polarization in far-out sidelobes. 
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Q: Perfect Feed 


Orientation of Linear 
Feed Probes 2% -0.00012% 


U : Perfect Feed 


—0.00012 % 


Fig. 5. GRASP-generated far-field beam patterns at 1420 MHz for a circularly symmetric 
paraboloidal reflector of diameter 25.9 m and f/D = 0.2941 with no feed legs or aperture blockage. 
All beam patterns are normalized to the peak main-beam Stokes J response and each frame cov- 
ers 3° 3° on beam center. A simulated feed pattern for a circular-waveguide feed (with inherent 
cross-polarization) with four A/4 chokes and dual-linear probes was used to illuminate the primary, 
producing beam patterns for: (top left) Stokes I with grayscale covering 0-100% (white to black), 
solid white contours covering (10%, 30%, 50%, 70%, 90%), and dashed black contours covering (1%, 
3%, 5%, 7%, 9%); Stokes Q (top middle), U (bottom middle), and V (bottom left) with grayscale 
covering (white to black) +0.2% of the peak Stokes I (thus white is —0.2%, black +0.2%, and gray 
0%), dashed black contours covering (—0.02%, —0.10%, —0.18%, —0.26%, —0.34%, —4.2%), dashed 
white contours covering (0.02%, 0.10%, 0.18%, 0.26%, 0.34%, 4.2%), and 0% contour omitted. The 
orientation of the dual-linear feed probes is indicated; the lobes of the Stokes Q pattern align with 
the probes while the U pattern is oriented at 45°. There is no discernible V response. The Stokes 
Q (top right) and U (bottom right) patterns are also shown for the same primary reflector being 
fed by a perfect feed with no inherent cross-polarization. Grayscale covers +0.00012% (white to 
black). These patterns show that the cross-polarization induced by the reflector alone has the 
same character and orientation as that produced by the waveguide feed and reflector working in 


conjunction, but the reflector-only pattern is narrower and more than 1000 times weaker. 


Stokes Q or U response will be entirely brought about by the reflector surface. Some 
important properties are immediately evident: 


(1) The Stokes Q and U cross-polarization patterns resemble a four-lobed clover 
leaf with lobes on opposite sides of the beam center having identical signs; the 
signs of adjacent lobes alternate in beam azimuth. We call this pattern beam 
squash. For a Stokes @ pattern with its positive-response lobes aligned along 
the vertical axis, this is equivalent to the beamwidth being larger in the vertical 
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direction than in the horizontal direction (meaning the feed pattern illuminating 
the primary reflector is wider in the horizontal direction than in the vertical). 

(2) The lobes of the beam squash pattern are aligned with the feed probe orientation 
for Stokes Q and are aligned at 45° for Stokes U. 

(3) The sign of the beam squash response reverses between the main beam and the 
first sidelobe. 

(4) The beam squash produced by the dish is dwarfed (by a factor of 3000 in 
this instance) by the beam squash inherent in the feed response.* This sit- 


18,25 even for corrugated conical horns whose 


uation almost always obtains, 
cross-polarization response can be designed to be significantly smaller than 


other types of feed.'®:7° 


5.2. Polarization Induced by the Feed Location: Beam Squint 


If a feed is tilted or displaced from the focus of a reflector such that the feed axis 
and the reflector axis are misaligned, an amplitude or phase slope is induced across 
the reflector’s aperture plane. In the far-field response, this translates to the RCP 
and LCP beams pointing in slightly different directions on either side of boresight; 
the displacement occurs in the plane that is orthogonal to the plane of symmetry 
of the reflector and is known as beam squint.?9?7 29 So if a feed is tilted and/or 
displaced from the reflector axis in the azimuth direction, the beam squint lobes 
will lie along the elevation direction. 

Offset paraboloidal reflectors are now commonly used in place of primary focus- 
fed circularly symmetric paraboloids in order to overcome the blockage and scat- 
tering brought about by the feed, receiver housing, and feed-support legs. In such a 
system, an elliptical section can be cut out of a circularly symmetric paraboloid in 
such a way that the primary focus is outside the main beam of the primary reflector. 
It is well known that such a system suffers a cross-polarization penalty in the form 
of beam squint. An off-axis secondary reflector can be added to the optical path 
and designed to minimize the squinting at a secondary focus.2? 3! The GBT and 
the planned Square Kilometer Array dishes employ this design. 

A circularly symmetric parabolic reflector in a Cassegrain or Gregorian config- 
uration can also suffer beam squint when the feed is positioned at a secondary focus 
that is located off of the primary’s axis of symmetry. This arrangement obtains for 
multiple feeds at the Effelsberg 100-m telescope and at the NRAO Very Large Array 
(VLA), where significant beam squints have been measured.3?'3 

If observing a large-scale region of emission for which the Stokes J brightness 
temperature varies with position, beam squint will respond to the first derivative of 
Stokes I with position. Measurements of 21-cm Zeeman splitting can be seriously 
affected by spatial gradients in the diffuse 21-cm emission interacting with the beam 


®The reflector cross-polarization decreases with increasing f /D.19:?4 
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squint in such a way as to produce an artificial Stokes V response that exactly 


mimics a Zeeman splitting signature.*+ °° 


5.3. Instrumental Polarization Induced by Aperture Blockage 
and Feed-Support Legs 


Structures that block the primary aperture are also a source of polarized sidelobes; 
these include feed-support legs, cables, subreflectors, and receiver cabins. These can 
produce instrumental polarization in sidelobes both near-in to and far-out from the 
main beam. While receiver cabins and subreflectors are complex structures whose 
effect on the telescope’s polarized response cannot be easily modeled, the effect of 
feed-support legs is relatively easy to simulate. Figure 6 shows the measured Stokes 
V response within 24° of the main beam of the now-collapsed 85-ft Hat Creek Tele- 
scope. The dashed lines trace four circular features whose Stokes V polarization 
response reverses sign on either side of beam center. While Refs. 34 and 21 have 
correctly pointed out that these arcs are related to the scatter cones generated by 
the quadrapod feed-support structure, they were at a loss to explain why the circular 


Fig. 6. Fractional Stokes V sidelobe response of the now-collapsed 85-ft Hat Creek Telescope out 
to 24° from beam center (adapted from Ref. 34). Only positive Stokes V is shown in grayscale; 
negative values appear as blank areas. The four thick gray lines show the inner portions of the 
four feed-leg scatter rings, with each ring represented by a different line style for clarity. Each ring 
is a small circle in the sky centered on the direction the feed leg points, whose angular diameter is 
twice the angle the leg makes with the symmetry axis of the primary reflector. (Each ring — there 
are as many rings as there are feed legs — passes through and draws its energy from the main 
beam. Feed-leg scattering therefore reduces telescope gain.) For each ring, the sign of the Stokes 
V response is reversed on either side of beam center. 
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polarization should display the observed pattern. Modern full-polarization simula- 
tions of feed-leg scattering using GRASP easily reveal this exact pattern, including 
the observed sign reversals.* Such simulations also reveal significant structure in the 
Stokes Q and U patterns, which can affect measurements of diffuse polarized Galac- 
tic continuum radiation." Because this radiation covers the entire sky, a polarized 
sidelobe sitting on the sky will pick up unpolarized radiation and alter the polarized 
component of the measured signal. Even polarized sidelobes sitting on the ground 
will affect the measured on-axis polarization via two possible mechanisms: (a) the 
ground’s thermal radio emission is linearly polarized,” and (b) unpolarized off-axis 
Galactic emission will reflect off the ground, becoming polarized in the process.?!:37 
Spectropolarimetric studies of the 21-cm line can also be affected since the diffuse 
Galactic 21-cm line emission covers the entire sky: this emission can reflect off the 
ground (becoming polarized in the process) and be picked up by sidelobes sitting 
on the ground. 

It might seem obvious that offset reflector telescopes with unblocked apertures 
have no (or at least much reduced) distant sidelobes, and therefore remove the 
complications just described. However, spillover is unrelated to aperture blockage, 
and if an unblocked aperture is overilluminated, producing spillover (around the 
primary or subreflectors), complications remain. 


5.4. Putting It All Together: The Full-Stokes Off-Axis Response 
of the Arecibo Telescope 


The Arecibo telescope is a very complicated system: it has a 305-m spherical pri- 
mary reflector with shaped secondary and tertiary Gregorian reflectors located in 
a focus cabin mounted on an azimuth arm. The cabin travels along a track on the 
arm allowing for the beam to be pointed in zenith angle (ZA), and the arm swings 


360° in azimuth. The azimuth arm and focus cabin are suspended from a large 
multistory triangular platform that is itself suspended via cables from three towers 
positioned around the primary’s perimeter. The platform and azimuth arm block 
~5-15% of the aperture. Despite these complexities, a team set out to map and 


tThe authors have not yet gleaned the phenomenological reason for the sign flip through beam 
center, but they take great comfort in seeing this empirically measured feature borne out by 


electromagnetic simulations. 
Another significant cause of polarized sidelobes involves the spillover of the feed response around 


the reflector or subreflector that it illuminates; depending on the geometry and orientation of the 
telescope, the spillover sidelobe can end up positioned on the ground or the sky. 

YNote that the GBT L-band feed was designed with too shallow a taper, such that a significant 
20° diameter spillover sidelobe exists around the secondary with its center offset from the main 
beam by 40°. At certain local sidereal times, 21-cm emission from the plane of the Milky Way can 
align with this spillover lobe and cause the on-axis response to change. Reference 38 showed that 
the instrumental polarization due to this spillover cannot be easily parameterized for the GBT, so 
that the advantages of the unblocked aperture are completely ruined for studies of 21-cm emission 
Zeeman splitting. 
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Fig. 7. Arecibo beam maps at 1175 MHz for all four Stokes parameters, normalized to the peak 
main-beam Stokes I response (adapted from Ref. 36). Azimuth direction is horizontal, zenith 
angle (ZA) direction is vertical. Each map is 19x19”. For Stokes I (top left) the grayscale covers 
0-100% of the peak Stokes I (white to black), solid white contours cover (40%, 50%, ..., 90%), 
solid black cover (10%, 20%, 30%), dashed black (1%, 2%, ..., 9%). For Stokes Q (top right) 
and U (bottom right) the grayscale covers (white to black) +2.8% of the peak Stokes J; thus, 
black is +2.8%, white is —2.8%, and gray is 0%. For V (bottom right) the total range is +1.6%. 
Contours are spaced by 0.4% for Q and U, 0.2% for V (with the 0% contour omitted for all); white 
contours are negative, black positive. The feed is native linear with probes at 45° with respect to 
the azimuth and ZA directions. 


parameterize the polarized beam patterns of the telescope at 1175 MHz by driv- 
ing the main beam across the unpolarized continuum source PKS B1749+096.3° 36 
Figure 7 shows the resulting fractional Stokes beam patterns; the azimuth direction 
is along the horizontal and the ZA direction is along the vertical. 

The far-out polarization response of this telescope — especially given the 
incredibly complicated structure of the suspended platform and the shaped sub- 
reflectors — is likely beyond the reach of accurate modeling via software such as 
GRASP. However, remarkably, some of the fundamental instrumental polarization 
features that we described for a primary focus-fed circularly symmetric paraboloid 
in Secs. 5.1-5.2 are clearly seen for Arecibc™”: 


(1) We saw in Sec. 5.2 that a displacement of the feed from the center of symmetry 
of the primary reflector will induce a beam squint in the Stokes V pattern. 
At Arecibo, any feed at the tertiary focus will always be displaced along the 


WSee Ref. 36 for a detailed discussion of these patterns. 
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azimuth arm, which points along the ZA direction. The beam squint lobes 
therefore ought to be aligned with the azimuth direction. This is exactly what 
is seen in Fig. 7. 

(2) The expected beam squash cloverleaf pattern is seen in the Stokes Q and U 
response. The Stokes Q pattern shows the expected reversal of sign in the first 
sidelobe. At the time these polarized beams were measured, the “old” L-band- 
wide feed was aligned with probes at 45° to the azimuth and ZA directions 
(since then, the “new” feed has replaced the “old” one and is aligned at 0°). In 
a simple primary focus circularly symmetric paraboloidal reflector system, the 
Stokes Q squash pattern for this feed orientation would have its lobes aligned at 
45° to the (Az, ZA) directions and the Stokes U pattern would be aligned with 
(Az, ZA). Neither is quite the case, and the Q and U patterns are certainly not 
offset from one another by the expected 45°. 

(3) The Stokes I beam is highly elliptical (by design) and shows a significant coma 
lobe to the left of the main beam. The first sidelobe response is extreme on 
the coma side of the main beam and the Stokes U pattern shows significant 
instrumental linear polarization response in this coma-side sidelobe response. 


6. Polarization Conventions 


The history of polarization studies is fraught with confusion that arises because of 
conventions. As early as 1896, Pieter Zeeman, in discovering his eponymous effect, 
measured the charge of (what would turn out to be) the electron to be positive!® 
Why? Because he had used a mislabeled quarter-wave plate and therefore swapped 
his sense of circulars.4° 

We'll say it now, and we'll say it again: When presenting polarization results, 


you must state your conventions. 


6.1. Linear Polarization 


There are two linear polarization conventions defined by the IAU:*! (1) the polar- 
ization angle y is zero at north; and (2) x is measured east of north. Thus, when 
represented on an image of the sky, a line segment representing polarization rotates 
counterclockwise as y increases, and vy = 0° corresponds to a vertical orientation .* 

In December 2015, the IAU sent an open letter to the astronomical community 
pointing out that researchers studying the polarization of the Cosmic Microwave 
Background (CMB) have been defining polarization angle to increase clockwise 


*ITAU Commissions 25 and 40 resolved to align the horizontal and vertical axes of the Stokes 
parameter reference frame along the Declination and Right Ascension axes, respectively. This 
might seem somewhat paradoxical as we tend to think of Declination as the vertical equatorial 
axis, but the choice sensibly retains a right-handed coordinate system for which y = 0° and 
Q/I = +1 for completely linearly polarized radiation aligned with the Declination axis. 
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on the sky. This effectively swaps the sign of Stokes U and causes confusion for 
astronomers studying Galactic polarization using CMB satellite data. 


6.2. Circular Polarization 


If you are interested in studying circular polarization, there are a few things you 
really need to worry about.Y The most important things to be aware of are: 


(1) Radio astronomers use the IEEE convention for the sense of circular polariza- 
tion*’ (which has been around since 1942) and have been doing so at least since 
Pawsey and Bracewell’s 1955 seminal textbook*® on the subject. Stick both 
your thumbs along the direction of propagation: whichever hand has its fingers 
wrapped in the direction that the electric field is rotating with time defines 
the handedness of the polarization sense. To wit, if radiation is incoming, then 
stick both your thumbs towards you. If the electric field is rotating counter- 
clockwise around the direction of propagation — your thumb — then your 
right hand describes the circular polarization state of IREE RCP. The IEEE 
logo even has a drawing of the right-hand rule, in case you ever forget which 
sense is RCP. This is opposite to the definition used by physicists and optical 
astronomers. 

(2) That last point leads to a serious problem: how should astronomers define Stokes 
V if optical and radio observers are using different definitions? A working group 
chaired by Gart Westerhout tried to tackle this problem at the 1973 IAU meet- 
ing in Sydney*! by establishing an IAU definition for Stokes V to be IEEE 
RCP minus IEEE LCP. Unfortunately, that definition just did not stick — not 
even among radio astronomers. This is likely because by 1974, the opposite 
convention was firmly established in many fundamental radio astronomy ref- 
erences. When Cohen introduced Stokes parameters to radio astronomers in 
195847 he had defined V as IEEE LCP — RCP. Kraus’s Radio Astronomy*® — 
“the bible” for many generations of radio astronomers — had also defined V as 
IEEE LCP — RCP in 1966 (and again in the 1986 2nd edition). 

Seemingly all pulsar observers (as well as Heiles and his Zeeman effect col- 
laborators), unaware of the IAU definition, have used the Kraus LCP — RCP 
definition for decades. The pulsar crowd have further muddied the situation 
by acknowledging the discrepancy and — rather than adopting the IAU con- 
ventions — introducing a special pulsar Stokes V convention that is defined 
oppositely from the IAU definition;® this is implemented in their software and 
data storage definitions. 

We collected a sample of 53 radio Zeeman papers and found: 71% failed to 
state whether they were using IEEE circular conventions, but we can give them 


¥YThe immense confusion encountered in dealing with circular polarization and Stokes V definitions 
has been outlined at length over the last two decades.*? 44 
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the benefit of the doubt; 57% failed to define their Stokes V convention; in the 
cases where the Stokes V convention is defined or can be clearly inferred, 56% 
used the IAU definition. 

(3) The sense of circular polarization reverses upon reflection. For telescopes with 
a feed at the primary or tertiary focus (e.g. Parkes, Arecibo, WSRT, GMRT), 
the Stokes V measured by the correlator will be the negative of the Stokes 
V signal incident on the primary surface. For telescopes with a feed at the 
secondary focus (e.g. L band at the GBT, Effelsberg, VLA), the sense of Stokes 
V measured by the correlator will match that of the incoming radiation. This 
subtlety was overlooked when Verschuur*? discovered 21-cm Zeeman splitting 
in the Perseus Arm absorption feature towards Cas A using the NRAO 140-ft 
(a prime-focus telescope). He plotted Stokes V as IEEE RCP —LCP incident on 
the feed and inferred a magnetic field pointing towards the observer; however, 
in a follow-up publication,°° he shows the same exact Stokes V spectrum and 
labels it as RCP — LCP, but this time as incident on the dish, with a note 
added in proof that he had previously assigned an incorrect sign for the derived 
magnetic field vector. The clear lesson here is that, in addition to stating the 
adopted definition of Stokes V, one must state what one’s Stokes V spectrum 
represents — the difference in circular polarization incident on the dish or inci- 
dent on the feed. The authors suggest that presenting Stokes V incident on the 
primary dish is the sensible choice: this represents the circular polarization state 
of the astronomical signal and removes the onus of tracking reflections from the 
reader. 

(4) The sense of circular polarization must be calibrated in order to tie the sign of 
the pseudo-Stokes correlator output S$c'3 to IEEE RCP or LCP. The incoming 
astronomical Stokes V signal must be positive for IEEE RCP, so if an astro- 
nomical source’ emits a signal with net RCP and produces S{0"3 < 0, then the 


src, 


sign of the correlator output must be corrected. 


6.3. Magnetic Field Direction 


There is a further conventional complication when comparing the direction of the 
line-of-sight component of magnetic fields in interstellar space that have been mea- 
sured by means of Zeeman splitting and Faraday rotation. Zeeman observers have 
always taken positive B to point away from the observer, analogous to Doppler 
velocity, but Manchester®! changed the convention in 1972 for Faraday rotation 
enthusiasts, who take positive B to point towards the observer in order to match with 
the convention that rotation measures are positive when the field points towards 
the observer. 


ZA helical antenna of known circular polarization sense can also be broadcast directly into the 
feed. 
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6.4. A Factor of Two in the Stokes Parameters 


Some observatories (e.g. the VLA) define Stokes J as the straight average of the 
autocorrelations in orthogonal feed responses rather than their sum. So if one were 
observing a continuum source producing a flux density of 30 mJy in the AA output 
and 30 mJy in the BB output, the reported Stokes J value would also have a flux 
density of 30 mJy. This does not conform to the convention for the Stokes parame- 
ters. Stokes J is defined as the sum of the orthogonal outputs and should have a value 
of 60 mJy in the above example. The AIPS and CASA software packages divide all 
the Stokes parameters by 2. At least they are consistent: the fractional polarization 
of a source should be the same whether using the AIPS/CASA convention or the 
proper Stokes convention. But the intensities of the Stokes parameters themselves 
will be half those of the proper convention, so if comparing fluxes between two 
telescopes, one needs to know what conventions were used to create Stokes J. The 
sheer momentum of this usage means that it will never be changed, so one must 
keep this in mind. 

Given the muddled history of polarization and magnetic field conventions over 
the last 50 years, there appears little chance that any single set of conventions (even 
those resolved by the IAU) will be adopted by all radio observers. The only possible 
way that we can reconcile different polarimetric observations is for you, the observer, 
to state your conventions when presenting results! 
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This review gives an introduction to spectrometers and discusses their use within 
radio astronomy. While a variety of technologies are introduced, particular empha- 
sis is given to digital systems. Three different types of digital spectrometers are 
discussed: autocorrelation spectrometers, Fourier transform spectrometers, and 
polyphase filterbank spectrometers. Given their growing ubiquity and significant 
advantages, polyphase filterbanks are detailed at length. The relative advantages 
and disadvantages of different spectrometer technologies are compared and con- 
trasted, and implementation considerations are presented. 


1. Introduction 


A spectrometer is a device used to record and measure the spectral content of 
signals, such as radio waves received from astronomical sources. Specifically, a spec- 
trometer measures the power spectral density (PSD, measured in units of WHz~') 
of a signal. Analysis of spectral content can reveal details of radio sources, as well 
as properties of the intervening medium. For example, spectral line emission from 
simple molecules such as neutral hydrogen gives rise to narrowband radio signals 
(Fig. 1), while continuum emission from active galactic nuclei gives rise to wideband 
signals. 

There are two main ways in which the PSD — commonly known as power 
spectrum — of a signal may be computed. The power spectrum, S,., of a waveform 
and its autocorrelation function, rz, are related by the Wiener—Khinchin theorem. 
This theorem states that the relationship between a stationary (mean and variance 
do not change over time), ergodic (well-behaved over time) signal a(t), its PSD, and 
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Fig. 1. A galactic hydrogen 21-cm line emission profile, as measured using the Breakthrough 
Listen digital spectrometer system on the Robert C. Byrd Green Bank Telescope in West Virginia. 


its autocorrelation is given by 
co, ‘ 
Sav) =i eae" dz, (1) 
—oo 


where v represents frequency, and 7 represents a time delay or “lag”. The autocor- 
relation function is 


Tea(T) = (a(t)e(t—7)), (2) 


where angled brackets refer to averaging over time. 
Equation (1) shows that the autocorrelation function is related to the PSD by 
a Fourier transform. In the discrete case, the relationship becomes 


[o<) 


Sxa(k) = y, (a(n)a(n = m)) guar, (3) 


m=—co 


which may be recognized as a discrete convolution. Here, the angled brackets average 
over time sample, n, and summation is performed over time lag, m. It follows from 
the convolution theorem that 


Swe (k) = (|X(k)I’), (4) 


where X(k) denotes the discrete Fourier transform (DFT) of x(n): 


N-1 


X(k) _ > aie ory (5) 


n=0 
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with N — oo. There are therefore two distinct classes of spectrometers: ones that 
approximate S',.(k) by firstly forming the autocorrelation, then taking a Fourier 
transform a la Eq. (3), and those that first convert into the frequency domain to 
form X(k) before evaluating Eq. (4). These two routes are shown diagrammatically 
in Fig. 2. We will refer to these as autocorrelation spectrometers (ACS, Sec. 3.1), 
and Fourier transform filterbanks (FTF, Sec. 3.3), respectively. Polyphase filterbank 
spectrometers (PFB, Sec. 3.4) can be thought of as an FTF with enhanced filter 
response. Note that because the DFT is an approximation to the continuous Fourier 
transform, ACS and FTF systems have different characteristics. 


1.1. Analysis and Synthesis Filterbanks 


It is important to note the relationship between spectrometers, filters, and filter- 
banks. A filterbank is simply an array of band-pass filters, designed to split an 
input signal into multiple components, or similarly, to combine multiple compo- 
nents. These are referred to as analysis and synthesis filterbanks, respectively. When 
applied to streaming data, a DFT can be considered an analysis filterbank, and an 
inverse DFT to be a synthesis filterbank. From this viewpoint, a spectrometer is 
simply an analysis filterbank, where the output of each filter is squared and averaged. 


1.2. Polarimetry 


Polarization is a key measurement within radio astronomy.! Although most astro- 
physical radio emission is inherently unpolarized, a number of radio sources — such 
as pulsars and masers — do emit polarized radiation, and effects such as Faraday 
rotation by galactic magnetic fields can yield polarized signals. A spectrometer that 
also measures polarization is known as a polarimeter (or spectropolarimeter). 


1.2.1. Stokes Parameters 


The Stokes parameters are a set of four quantities which fully describe the polar- 
ization state of an electromagnetic wave; this is what a polarimeter must measure. 


x(t)x(t-1) (x(t)x(t-1)) 


delay & multiply time average 


2 
square time average 


Fig. 2. The two methods used to compute the PSD of a signal. The top path corresponds to an 
ACS system while the bottom corresponds to an FTF system. The two approaches are related by 
the Wiener—Khinchin theorem. 
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The four Stokes parameters, J, Q, U and V, are related to the amplitudes of per- 
pendicular components of the electric field: 


Ey, = ex(t) cos(wt + dz), (6 


na 


Ey = ey(t) cos(wt + dy) (7 


— 


by time averages of the electric field parameters: 


I = (E, Ex + EyEr), (8) 
Q = (E,E; — EyE;), (9) 
U = (EB, Ey + EyE3), (10) 
V =i(E,E; — E,Ez), (11) 


where * represents conjugation. The parameter J is a measure of the total power in 
the wave, Q and U represent the linearly polarized components, and V represents 
the circularly polarized component. The Stokes parameters have the dimensions of 
flux density, and they combine additively for independent waves. 


1.2.2. Measuring Polarization Products 


In order to compute polarization products, a spectrometer must be presented with 
two voltage signals, x(n) and y(n), from a dual-polarization feed (i.e. a set of orthog- 
onal antennas). With analogy to Eq. (4), we may form 


Sea(k) = (X(k)X*(k)) = (|X(K)I?), (12) 
Syy(k) = (¥(k)Y¥*(k)) = (IY (ADI), (13) 
Sny(k) = (X(k)Y"()), (14) 
Sya(k) = (¥(k)X*(k)), (15) 


where in addition to measuring the PSD of x(n) and y(n), we also compute their 
cross correlations. Note that while S;, and S,, are real valued, Sz, and Sy, are 
complex valued. 

The four terms (EZ, E7), (EyE}), (HE, Ej), and (E,E7) are linearly related (by 
calibration factors) to the quantities in Eqs. (12)—(15) above. Combining these there- 
fore allows for Stokes J, Q, U and V to be determined. 

In order to focus on the fundamental characteristics of spectrometers, the 
remainder of this chapter details single-polarization systems that compute only S,.. 
Nevertheless, the techniques and characterization approaches are broadly applicable 
to polarimetry systems. See Chapter 6 of this volume for a more detailed treatment 
of radio polarimetry. 
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1.3. Performance Characteristics 


Spectrometers operate over a finite bandwidth B, over which N channels with 
bandwidth Av = B/N are computed. With digital systems, channels may be evenly 
spaced with identical filter shapes. 


1.3.1. Spectral Leakage 


Ideally, each channel would have unitary response over v,. + ay where v, is the 
center frequency, with zero response outside this passband. In practice, this cannot 
be achieved; each channel has a non-zero response over all frequencies. As such, a 
signal will “leak” between neighboring channels, known as spectral leakage. 

Figure 3 compares the normalized filter response for ACS, FTF and PFB imple- 
mentations. In the presence of strong narrowband signals, such as radio interference 


(RFI), spectral leakage is a major concern. 


1.3.2. Scalloping Loss 


A related concern is that a channel’s non-ideal shape will cause narrowband sig- 
nals at channel edges to be attenuated, an effect known as scalloping loss (Fig. 4). 
Spectrometers are often designed such that neighboring channels overlap at their 
full-width at half-maximum points (FWHM), in which case the signal will be spread 
evenly over both channels. Wideband signals are not affected by scalloping. 


Magnitude (dB) 


WD le ee a ee ae ae EE ER eet 


3 2 1 0 1 
Frequency (Normalized to bin width) 


Fig. 3. Comparison of the channel response of an ACS (dotted line), FTF (dashed line) and an 
8-tap, Hann-windowed PFB (solid line). 
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Fig. 4. Example of scalloping loss between spectrometer channels. The dashed lines show the 
response of individual channels, while the black line shows the overall response. 


1.3.3. Time Resolution 


Time resolution refers to the minimum period over which a spectrometer averages 
the data. For a spectrometer with N channels over a bandwidth B, the time reso- 
lution is tres = 2B/(RN), where R is the length of the averaging window. Detec- 
tion of transient phenomena, such as fast radio bursts and pulsars, requires tres 
to be as short as a microsecond, whereas integration lengths of several seconds, 
often averaged even further in post-processing, are common when observing faint 
sources. 


1.3.4. Dynamic Range 


Dynamic range refers to the span of input powers over which a spectrometer can 
operate nominally. The presence of RFI and the input bandwidth are the main 
drivers for dynamic range; see Sec. 2.1.4. 


2. Digital Systems 


Digital signal processing (DSP) techniques are well-suited to applications such as 
filtering and forming filterbanks. As such, a majority of current-day spectrometers 
are based on digital technology. A basic understanding of DSP is required to fully 
understand digital spectrometers; there are several excellent introductory DSP texts 
available.?'3 

In the diagrams and equations in this chapter, the symbol ® denotes multipli- 
cation of time samples; © denotes addition. The symbol z~” is used to denote a 
time delay of n units, due to the relationship between time delay in a digital stream 
and the so-called z-transform. 
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2.1. Digital Sampling 


Digital sampling, or digitization, is the process of converting an analog signal to a 
digital one; devices known as analog-to-digital converters (ADCs) do this conversion. 
The two main characteristics of an ADC are its sample rate, vs, and the number of 
bits per sample, npits- 


2.1.1. Nyquist Sampling 


The Nyquist theorem — one of the most fundamental theorems within signal pro- 
cessing — states that a band-limited signal may be fully recovered when it is sam- 
pled at a rate that is twice the bandwidth, v, = 2B. Sampling at the Nyquist rate 
is referred to as critical sampling, under the Nyquist rate as undersampling, and 
sampling over the Nyquist rate as oversampling. Sample rates may be increased by 
a process known as upsampling and decreased by downsampling, by using sample 
rate conversion filters. Here, we use the symbol | D to denote downsampling by a 
factor D and } U for upsampling by a factor U. 

Undersampling a signal causes an effect known as aliasing to occur, whereby 
different parts of a signal are indistinguishable from each other, resulting in infor- 
mation loss. Oversampling a signal does not increase the information content, but 
under certain circumstances is advantageous for reducing noise and/or distortion. 


2.1.2. Quadrature Sampling 


Quadrature sampling? is the process of digitizing a band-limited signal and trans- 
lating it to be centered about 0 Hz. A quadrature-sampled signal is complex val- 
ued, in contrast to real-valued Nyquist sampling. A quadrature-sampled signal has 
v, = B; that is, each complex-valued sample is equivalent to two real-valued sam- 
ples. Quadrature-sampled signals may have negative frequency components (i.e. 
below 0 Hz). 

A Nyquist-sampled signal z(n) centered at vp can be converted into a 


quadrature-sampled signal x’(n) by multiplication with a complex phasor e~27*”0”: 


a(n) = cine oe, (16) 


which is known as quadrature mixing. 

The e~?7'¥o" term in Eq. (16) is identical to that encountered in the DFT. Each 
channel of a DFT can be seen as quadrature mixing the input signal, applying a 
filter of width B, and then downsampling the signal to a rate v, = B. 


2.1.3. Quantization Efficiency 


The earliest digital correlators? used only two-level sampling (one bit), assign- 
ing a value of either +1 or —1. This scheme works remarkably well for weak, 
noise-dominated signals*: for Nyquist-sampled signals, a signal-to-noise ratio of 


*That is, signals with probability distributions close to Gaussian. 
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Table 1. Quantization efficiencies ng 
for Nyquist sampling with different bit 
depths Npits. The value € is the thresh- 
old between quantized values, in units 
of the standard deviation of the signal. 
Table modified from Ref. 4. 


Npits Mievels E 1NQ 
2 4 0.995 0.88115 
3 8 0.586 0.96256 
4 16 0.335 0.98846 
5 32 0.188 0.99651 
6 64 0.104 0.99896 
A 128 0.0573 0.99970 
8 256 0.0312 0.99991 


2/x = 0.637 that of the unquantized signal is achievable.® For 2-bit data, one can 
achieve 88% quantization efficiency, which rises to 98% for 4-bit data. Given these 
high values for low bit-widths, it is common to use bits sparingly within radio 
astronomy applications. A listing of quantization efficiencies* is given in Table 1. 

Achieving peak quantization efficiency relies on setting the threshold between 
quantized values optimally. In Table 1, the threshold ¢ is expressed in units of the 
signal’s standard deviation c. In order to leave headroom for interfering signals, one 
may deliberately set ¢ larger than that optimal for signals with Gaussian probabil- 
ities to increase dynamic range. 


2.1.4. Dynamic Range 


For modern-day radio environments, RFI is the main driver of sampling bitwidth. 
RFI may be orders of magnitude stronger than an astronomical signal of interest, 
requiring a large dynamic range in the digitized waveform. If the maximum input 
power to an ADC is exceeded, an effect known as clipping will occur, in which 
the waveform is distorted and spurious harmonics are introduced into the digitized 
waveform. 

The theoretical maximum dynamic range of an ADC in decibels is given by 


DR = 20 logy9(2”**) & 6.02 npits - (17) 


In practice, as ADCs are imperfect analog devices, their effective number of bits 
(ENOB) is lower than the number produced by the ADC. For example, an 8-bit 
ADC may have an ENOB of 7.5, resulting in a dynamic range of 45 dB. 


2.2. Windowing Functions 


The DFT is computed over a finite number of samples, N, also known as the 
window length. As the window length is not infinite, the response of the DFT is 
not perfect, resulting in spectral leakage (Fig. 3). This can be understood if we 
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consider that the DFT 


N-1 

X'(k) = y mae (18) 
n=0 

= >» TI(n)a(n)e~27rk/N (19) 

= F{IIn(n)} « X(k), (20) 


where F denotes the Fourier transform, and II(n) the rectangle (or tophat) function: 


0 ifn<Q0O, 
y(n) =41 if0<n<N-1, (21) 
0 ifn>N-1, 


which is Fourier paired with the sinc( ) function.» In other words, we can consider 
the finite length of the DFT as effectively convolving the perfect Fourier transform 
response X'(k) with a sinc function. The undesirable peaks of the sinc function are 
referred to as sidelobes. 

Windowing functions” improve the response of a DFT, by somewhat mitigating 
sidelobe response at the expense of increasing the channel width. They are applied 
by multiplying the signal a(n) by a weighting function, w(n): 


N-1 ; 
Xw(k) = Y* w(n)a(n)eW277r*/N (22) 
n=0 
= W(k) * X(k). (23) 


The take-home message of all this is that DFT channels have a non-zero response 
outside their passband (Fig. 5), and that applying a windowing function can improve 
their response. 

Windowing functions are also important in the design of digital filters 
(Sec. 2.3). Some common windowing functions and their frequency-domain mag- 
nitude responses are shown in Fig. 6; their functional forms are given in Table 2. 
The most appropriate windowing function is dependent upon application; for digital 
spectrometers, the Hamming and Hann windows are commonly applied. 


bsinc(x) = sin(x)/a This is the same relationship as that between light passing through a single 
slit aperture and its far-field diffraction pattern. 
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Amplitude response of the DFT (dashed line), compared to amplitude response of a Hann- 
windowed DFT (solid line). Applying a windowing function lowers sidelobes while broadening the 


channel response. 
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-16 -8 0 8 16 


-16 -8 0 8 


Frequency bin Frequency bin 


been rounded to four significant digits. 


Weighting function 


Uniform (rectangular) 


Bartlett (triangular) 


General form: 
Hann 
Hamming 
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Blackman—Nutall 
Blackman-—Harris 


w(n) 


1 
1— (|n|/(N — 1) 


ao — a1 cos( ss 


ag =0.50 a, = 0.50 
ag = 0.54 a, = 0.46 


ao — a1 cos( ae + a2 cos( — — az cos( a 

ag = 0.3558 ay = 0.4874 ag =0.1442 az = 0.0126 
ao = 0.3636 a, = 0.4892 ag =0.1366 ag = 0.0106 
ag = 0.3588 a, = 0.4883 ag =0.1413 a3 = 0.0117 
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Common windowing functions used in DFT filterbanks. Coefficients have 
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y(n) 


Fig. 7. N-tap FIR filter block diagram. An FIR filter applies a weighted sum to the input sequence 
x(n) to compute the filtered signal y(n). 


2.3. Finite Impulse Response Filters 


A finite impulse response (FIR) filter is the windowed moving average of an input 
sequence x(n). An FIR filter computes the sum 


y(n) = ) | h{k)a(n — k), (24) 


where y(n) is the output sequence, and h(k) is a set of K coefficients used for 
weighting. The upper summation bound, K, is called the number of taps. A stream- 
ing implementation of an FIR filter is shown in Fig. 7. 

If downsampling a FIR filtered signal by | D, we only keep the outputs n = rD. 
In such cases it is more efficient to only compute the terms we wish to keep: 


y(rD) = h(k)a(rD — k). (25) 
k=0 


One way we can accomplish this is to use a polyphase decimating filter, which is 
discussed below. 


2.4. Polyphase FIR Filters 


A common DSP technique is to decompose an input sequence x(n) into a set of P 
sub-sequences, «p(n’), each of which is given by 


ap(n') = (| P)(z-?)x(n). (26) 


This is known as polyphase decomposition.® As a simple example, even and odd 
decomposition of the signal x(n) is achieved when P = 2: 


xo(n') = {a(0), (2), x(4),...}, (27) 
x(n’) = {a(1), #(3),v(5),...}. (28) 


More generally, a signal may be decomposed into P different “phases”. 
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Polyphase filter structures are often more efficient than standard finite impulse 
response filters when used in sample rate conversion. A | P decimating FIR filter 
of length kK = MP can be constructed from P discrete FIR filter branches, each 
acting upon a different phase. The value M is referred to as the number of polyphase 
taps on each branch, such that 


P-1M-1 
y(n’) = D7 YF hp(m)ap(n' — m), (29) 
p=0 m=0 
This is known as a decimating polyphase filter. 

Decimating polyphase filter structures are far more efficient than standard FIR- 
based downsampling techniques. If | D downsampling occurs after the moving aver- 
age of Eq. (24), we are computing D sums, but only keeping 1 in D of these. This 
is inefficient; in contrast, Eq. (29) only computes values of interest. 


2.5. The Fast Fourier Transform 


The fast Fourier transform? !° (FFT) is a highly efficient algorithm for computing 
the DFT of a regularly-sampled signal. When applied over non-overlapping blocks 
of length N of a time stream — as done in FTF systems — we may write the rth 
output of a DFT as 

N-1 

X(k,rN) = .e a(rN —n)eW27ire/n | (30) 

n=0 
By comparison with Eq. (25), we recognize this as a bank of N FIR filters, down- 
sampled by | N. This is a key insight toward understanding DF'T-based filterbanks: 
the DFT should be thought of as more than just a transformation from time to 
frequency domain. 

To directly compute the DFT would require of order O(N?) operations, but the 
FFT algorithm reduces this to only O(N loge N) operations. FFT implementations 
generally exhibit best performance when JN is a power of 2. 

For real-valued data, only N/2 channels are unique. FFT algorithms often 
exploit this for increased efficiency, by recasting the real-valued input data as com- 
plex values under-the-hood. The O(N logg N) performance of the FFT is a major 
driving factor for the adoption of FTF spectrometers over their ACS counterparts, 
which require O(N?) operations. 


3. Digital Spectrometers 


As discussed in Sec. 1, there are two equivalent paths that may be used to compute 
the PSD of a signal, as shown in Fig. 2, referred to as ACS and FTF systems. As 
the DFT must be computed over a finite number of points, ACS and FTF systems 
have different characteristics. 

The first digital spectrometer used for radio astronomy was developed by 
Weinreb? in 1963 — two years before the FFT algorithm was introduced by Cooley 
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and Tukey. This 1-bit ACS was used to observe the 18-cm wavelength hydroxyl 
(OH) absorption line in the spectrum of Cassiopeia A, providing the first evidence 
of OH in the interstellar medium.!! The first reference to FTF spectrometers for 
radio astronomy can be found in Chikada et al.;!2 however FTF spectrometers did 
not enjoy widespread adoption until much later. The PFB architecture was first 
introduced by Schafer!? in 1973 and expounded by Bellanger!* in 1976, but was 
not introduced for the purposes of radio astronomy spectrometry until 1991.1> 16 
Bunton!” further popularized the PFB within radio astronomy in 2000, suggesting 
its use in radio interferometer correlator systems. Given their growing ubiquity, 
PFB systems (which are essentially enhanced FTF spectrometers) are detailed at 
length in Sec. 3.4, 

Spectrometers do not compute the true PSD, S,.(k); rather, they compute an 
approximation, S’..(k). Further, as a spectrometer has time resolution (Sec. 1.3.3), 
the spectrometer output has a time dimension, i.e. S/,,, = S”..(k,r), where r is the 
integration number. 


3.1. Autocorrelation Spectrometers 


In an ACS, the PSD is computed over a discrete range of M delays: 


M-1 

Suh) = >) @jea—mpe (31) 
m=0 

= S© Un(n) (a(n)2(n —m)) err’ (32) 

= sinc(k) * Sz2(k). (33) 


That is, the finite summation causes convolution of the true PSD with a sinc( ) 
function. 

The spacing of lags (i.e. delays, 7) in an ACS determines how much bandwidth 
it can process without aliasing occurring. The Nyquist criterion requires two taps 
per wave period at the highest frequency signal of interest, with the maximum lag 
Tmax Setting the spectral resolution, Av = ~1/Tmax- 


3.2. Fourier Transform Spectrometers 


FTF spectrometers compute the PSD of a signal by applying a DFT of length N to 
an input signal, squaring the DFT output, then taking an average over time. From 
Kgs. (4) and (20), an FTF spectrometer computes 
2 
Sra (k) = (|X"(k)|") (34) 
= (|sine(k) « X(k)|*) (35) 
sinc?(k) * Szo(k). (36) 
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That is, the finite DFT summation bounds give rise to a convolution of the true 
PSD with a sinc?( ) function. 

For a signal with sampling rate vy, = 2B, each DFT channel has a bandwidth 
Av ~ B/N, and a quadrature-sampled output rate of v,/2N = B/N. As mentioned 
in Sec. 2.1.1, the DFT (Eq. (5)) may be thought of as the mixing of the input signal 
with a bank of oscillators, followed by an averaging with a square window function. 


3.3. FTF and ACS Comparison 


The FFT (Sec. 2.5) allows Eq. (5) to be evaluated in O(N log, N) operations, or 
Vv; log, N when performed every N samples. In comparison, an ACS requires O(N?) 
computations. For a spectrometer with a moderate 104 channels, the FFT algorithm 
requires approximately 0.1% as many operations as an ACS system. 

ACS systems are more affected by spectral leakage than FTF systems (Fig. 3), 
due to the sinc( ) convolution in Eq. (33), versus the sinc?( ) convolution encountered 
in FTF systems (Eq. (36)). With current digital technology there is no compelling 
reason to implement an ACS spectrometer. Regardless, the earliest digital spec- 
trometers were ACS based. Their prevalence in early systems can be explained by 
two reasons: for 1-bit data they can be implemented using simple boolean logic 
circuits; further, they pre-date the FFT algorithm. 


3.4. Polyphase Filterbanks 


A PFB is a computationally efficient implementation of a filterbank, constructed 
from an FFT preceded by a prototype polyphase FIR filter frontend.!* 14.18 PFB- 
based spectrometers offer vastly lowered spectral leakage over both ACS and FTF 
architectures, with a modest increase in computational requirements. 

The PFB exploits the fact that a lowpass filter with coefficients h(k) can be 
converted into a quadrature bandpass filter with central frequency v by multiplying 
the coefficients by e’?"”. Now, suppose we have implemented a decimating lowpass 
polyphase filter (Eq. (29)). The output of each branch is 


M-1 


yp(n') = D7 hp(m)axp(n' —m), (37) 


m=0 


where h,(m) are coefficients from our prototype lowpass filter. Normally, we would 
sum across the P branches (i.e. over the sub-filters y,) to construct y(n’), as in 
Eq. (29). Here is where we get tricky. If instead of just summing up the P branches, 
we feed the branches (sub-filters) into a DFT with P inputs, as in Fig. 8, we then 
have 


P-1 
Y(k,n’) = s Yp(n! )e~2rtkP/P (38) 
p=0 
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P-1M-1 


= > > [Ap(m)e27**P/P la (mn! —m). (39) 


p=0 m=0 


Comparing this form to Eq. (29), we recognize that the output of this structure is 
equivalent to a set of | P decimating polyphase filters, where the central frequency 
of each filter is shifted by an amount p/P; this is a polyphase filterbank. From here, 
the output is squared and time averaged to form the PSD. 

The order of summations in Eq. (39) is important; as written, it is more com- 
putationally efficient. The overhead of Eq. (39) over a windowed FTF® is an extra 
(M —1)P operations, due to the polyphase FIR frontend. Generally, the number of 
taps M < P, so the increase in required operations is moderate. Extra memory is 
also required for buffering of the M x P time samples and filter coefficients. 

Two different representations of PFB spectrometers are given in Figs. 8 and 9 
Figure 8 shows a block diagram of a polyphase FIR frontend preceding an FFT. 
The commutator splits the input into P branches, feeding a different “phase” of the 
input signal to each of the polyphase sub-filters. That is, the commutator applies a 
z—P delay on each branch before a | P downsampling. 


.. X(1) 


w. X(2)x(1) 


commutator 


' 
‘ 


y 


.. X(P) 


Fig. 8. Polyphase filterbank streaming implementation. A PFB is formed when a polyphase 
FIR filter structure is combined with a DFT. This diagram is an alternative, but equivalent, 
representation to Fig. 9 Note that in this diagram, indices m and p run from 1 instead of 0. 


°As an aside: a windowed FTF can be considered to be a one-tap PFB. 
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Input data stream Filter coefficients 


P point 
iP M=3 0) 
yh, (m) x,(m-n ) 


m=0 


Fig. 9. Graphical representation of a polyphase filterbank, with P = 64 and M = 4 polyphase 
taps. Data are read in blocks of length P until M x P samples are buffered. The data and filter 
coefficients are then split into M taps, multiplied together, and then summed over taps. After this, 
a P-point DFT is computed and another P input samples are read. 


Figure 9 shows the action of a PFB FIR frontend on a data stream in several 
stages. An input signal a(n) of length M x P is first multiplied by filter coefficients 
h(n). The data are then split into MW blocks of length P, and summed over M taps. 
After this, a DFT of length P is applied to form a filterbank, followed by squaring 
and time-averaging to compute the PSD. 

A simple PFB implementation in Python is given at https://github.com/ 
telegraphic/pfb_introduction. Provided alongside this code is an annotated inter- 
active notebook that provides further documentation and explanation of the PFB 
technique. High performance codes for radio astronomy application are detailed in 
Refs. 19 and 20. 


3.5. Zoom Modes 


The output of each channel of a DF T-based filterbank is a critically-sampled quadra- 
ture time stream of its own right. Higher spectral resolution can be achieved by 
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passing the output of a “coarse” first-stage filterbank channel into a secondary 
DFT to apply finer channelization, after which the samples may be squared and 
averaged to compute the PSD. Spectrometers that employ this approach are known 
as zoom spectrometers. 

An extension of the zoom spectrometer can be used to efficiently develop filter- 
banks of many millions of channels. The second-stage filterbank in a zoom spectrom- 
eter only needs to run at 1/N the rate of its first-stage filterbank. If the second-stage 
DFT (of length M) is run at the same speed as the first-stage filterbank, one can 
run the second-stage DFT on every first-stage channel, instead of just selecting one. 
The result is a filterbank with N x M channels. 

To do so requires that the output of every first-stage channel is buffered so 
that there are M samples per channel, then data must be rearranged and fed to 
the second-stage DFT in channel order. This reorder can be considered a matrix 
transpose (also called a cornerturn), rearranging from (NV, M) to (M, N) order. 

As an example, a zoom-style spectrometer with N = M = 1024 has N x M/2 = 
524, 288 channels total. This approach is often used in the search for extraterrestrial 
intelligence (SETI),”! in order to achieve sub-hertz resolution over many hundreds 
of megahertz input bandwidth. 


4. Alternative Spectrometer Implementations 


4.1. Swept Spectrometer 


A swept spectrometer uses a variable oscillator with a heterodyne circuit and a low 
pass filter. The oscillator is typically varied, i.e. swept, through a range of desired 
frequencies. As it is swept through the desired frequency range, the power of the 
low pass filter’s output is measured and recorded. Most analog spectrum analyzers 
operate in this manner. 

An advantage of swept spectrometers is that they can operate over large RF 
bandwidths. However, as only a fraction of the band is detected at any one time, 
less integration time is available per frequency channel. The RMS noise per channel 
in a swept spectrometer is VN higher than an equivalent FTF spectrometer with 
N channels covering the entire RF bandwidth. As such, swept spectrometers are 
best suited for cases where signals of interest are strong and wideband. 


4.2. Analog Filterbank 


An analog filterbank is just what its name implies: a bank (or collection) of analog 
filters. The analog filters are designed to pass through different ranges of frequencies. 
The power of each filter’s output is measured and recorded, from which spectral 
features can be discerned. 

Analog filterbanks may offer very wide bandwidths, but design of very nar- 
rowband filters is challenging. Additionally, the input signal must be split multiple 
times, and each time the signal is split, its power halves. Unlike digital systems, the 
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shape and gain of each filter may differ. For these reasons, analog filterbanks are 
uncommon in modern radio astronomy. 


4.3. Analog Autocorrelator Spectrometers 


An analog ACS uses analog circuitry to implement multipliers and propagation 
times through carefully constructed delay lines to implement the desired tap delays. 
An example implementation is given in Ref. 22. 

The main advantage of the analog autocorrelator over its digital counterpart 
(Sec. 3.1) is that digitization need only take place at a rate commensurate with 
the averaging period of the correlator, rather than the bandwidth of the input 
signals. For this reason, analog ACS spectrometers are usually seen in systems that 
have instantaneous bandwidths of many gigahertz. Their major disadvantage is that 
the number of physical components required scales with the number of channels, 
making analog ACS systems with many channels — readily implemented by digital 
systems — infeasible. 


5. Current Technology 


Most modern implementations of spectrometers in radio astronomy are PFB based, 
and use commercially available high-speed ADCs. Over the years, ADC input band- 
width has grown from kilohertz to the gigahertz we see today. Specifications of some 
example high-speed ADCs that are currently available are given in Table 3. 

Once analog signals are digitally sampled, a variety of signal-processing plat- 
forms are available on which to implement the algorithms described earlier in this 
chapter: 


Central Processing Units (CPUs), of the type found in widely available laptop 
and desktop computers, are capable of processing only relatively small bandwidths, 
but are cheap and very easy to program. Though largely superseded in modern high- 
bandwidth systems, a notable example is UC Berkeley’s distributed SETI@home 
project.?3 


Table 3. Example high-speed ADCs that are currently 
commercially available. 


Sample rate 


(GS/s) Nits Manufacturer Part 
5 8 e2v EV8AQ160 
15 4 Adsantec ASNT7122 
26 3t Analog devices HMCAD5831 
30 6 Micram ADC30 


+Plus overrange bit. 
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Graphics Processing Units (GPUs) are processors with thousands of arith- 
metic cores, capable of performing many trillions of operations every second. GPU 
development is driven by the computer gaming industry, but over the last decade an 
increasing focus has been placed by GPU manufacturers on general-purpose GPU 
(GPGPU) computing uses. GPUs have gained significant traction in spectrometers 
where large FFTs are required in order to achieve high spectral resolution.+ 


Field Programmable Gate Arrays (FPGAs) are logic chips that incorporate 
many thousands of arithmetic cores in a fabric of programmable logic and intercon- 
nect. FPGAs excel at processing large data rates and provide low-level interfacing 
capabilities, allowing them to be directly connected to modern ADC chips. While an 
increasing number of off-the-shelf FPGA platforms are available, the needs of radio 
astronomers often motivate the design of custom boards. FPGAs are also relatively 
difficult to program, requiring specialist knowledge of their underlying hardware 
details to utilize them efficiently. FPGAs have smaller quantities of memory than 
CPU or GPU processors, and are often used for high data-rate, coarse-resolution 


spectrometers.?° 


Application Specific Integrated Circuits (ASICs) are custom-designed chips 
with underlying circuitry dedicated to performing the operations defined by the 
designer. The custom nature of an ASIC makes it the most power-efficient comput- 
ing platform, though this efficiency comes at the cost of large development time and 
effort. With the increasing performance and power efficiency of FPGAs and GPUs, 
ASIC development is not as prevalent in radio astronomy as it once was. However, 
ASICs may still be desirable for space-based spectrometers, where power efficiency 
is paramount.?° 

Hybrid systems. In many cases a spectrometer may be heterogeneous in nature, 
with different stages of processing performed on different hardware platforms. Fre- 
quently, FPGAs are used to facilitate interfacing a high speed ADC chip with a 
network of CPU or GPU signal processing devices.?! Spectrometers such as the 
VEGAS spectrometer at the Robert C. Byrd Green Bank Telescope 2” also utilize 
FPGAs for ADC interfacing and coarse channelization, before signals are further 
filtered to a fine frequency resolution using GPUs. 


5.1. Common Infrastructure Development 


CPU and GPU processing platforms are developed by commercial entities, moti- 
vated by the non-astronomy markets. However, leveraging the latest hardware 
requires users to have access to flexible programming tools and software libraries. 
A number of open-source projects have emerged trying to serve this need. The 
GNURapi0o project? provides a software environment for rapid development of 


4 onuradio.org 
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CPU-based instruments for processing radio signals. A number of radio astronomy 
projects have also developed generic software pipelines for streaming data between 
Ethernet networks, CPUs and GPUs (see, for example, HASHPIPE.° PSRDADA' 
and BIFROST®). 

FPGA platforms are expensive to design and manufacture. For this reason, 
radio astronomy groups such as the Collaboration for Astronomy Signal Processing 
and Electronics Research (CASPER) have developed a variety of general-purpose 
FPGA-based platforms that can interface with a suite of connectorized ADC cards. 
CASPER provides a variety of open-source software tools and libraries with the 
aim of simplifying FPGA programming and enabling straightforward upgrading of 
instruments when newer, more capable hardware becomes available. 
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This review describes the salient features of the various devices that have been 
and are currently in use to carry out pulsar observations on single-dish radio 
telescopes and phased arrays. These “pulsar backends” broadly fall into three 
main classes: searching, timing and cyclic spectroscopy. Thanks to improvements 
in digital signal processing, and in the availability of high-speed computing power, 
many of these devices are close to being optimal in terms of sensitivity per unit 
bandwidth. Future improvements in sensitivity will depend upon the increased 
bandwidth of receivers and collecting area of telescopes currently planned, and 
coming online. 


1. Introduction 


The wealth of applications possible from studies of pulsars typically exploit their 
clock-like rotational stability. Excellent examples include tests of strong-field gravity 
from observations of binary pulsars,! the detection of extra-solar planetary systems? 
as well as on-going efforts to detect low-frequency gravitational waves by timing an 
array of pulsars distributed over the sky.? In addition to the high temporal resolution 
necessary to perform these experiments, pulsar data acquisition systems must be 
able to remove the dispersive delay caused by the propagation of the pulsar signal 
through the interstellar medium (ISM). 

As discussed by numerous authors (see, e.g. Ref. 4), this dispersion is due to 
the presence of free electrons in the ISM that causes the propagation of a signal 
over some distance D to be delayed compared to one with infinite frequency by an 
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amount 


i P al 
at([' 4-0), () 


where c is the speed of light and pw is the index of refraction. For a cold plasma 
with electron number density n. where the plasma frequency is much less than the 
electromagnetic wave frequency, f, it can be shown (again see, e.g. Ref. 4) that 


e7 Ne 


 Qrme f2’ 


w= (2) 
where e and m, are the electron charge and mass. Integrating Eq. (2) for this form 
of the refractive index, the delay is 


- e2 DM 
 Inmec f2’ 


(3) 


where the relevant astrophysical quantity, the dispersion measure, 


D 
DM = J ne dl, (4) 
0 


is the integrated column density along the line of sight and is usually expressed in the 
hybrid units of cm~? pe. The DM can be determined directly from a measurement of 
the pulse arrival time with frequency according to this model. The 1/f? dependence 
predicted here is observed for pulsars over a wide frequency range.? This model 
therefore serves as the starting point for all dispersion removal devices discussed in 
this chapter. Models of the free electron content (see, e.g. Ref. 6) predict a typical 
value for ne = 0.03 cm7® pe. 

Two main techniques exist to remove this dispersive effect from a pulsar obser- 
vation. The simplest technique, known as incoherent dedispersion, applies appro- 
priate delays to channelized data after detection. As discussed in this chapter, this 
approach is limited by the finite width of frequency channels in data acquisition 
devices. An optimal approach is to sample the raw voltages from the telescope and 
apply the inverse of the dispersion delay to the raw voltages before forming a time 
series. Both approaches are now in widespread use in pulsar astronomy and the 
purpose of this chapter is to familiarize the reader with the main developments in 
this field. 

The outline for the rest of this chapter is as follows. Section 2 details the tech- 
niques and hardware used to perform incoherent dedispersion in pulsar searches 
and follow-up studies. Section 3 reviews the techniques and implementation for 
coherent dedispersion, primarily used in timing and other follow-up work. Section 4 
reviews cyclic spectroscopy and current approaches to implementation. To conclude, 
in Sec. 5, a look ahead is given to future pulsar backend developments in the era of 
new telescopes. 
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2. Incoherent Dedispersion 


The simplest approach to pulsar data acquisition, and the one that is used as a 
starting point for discovery and initial follow-up, is to use devices that channelize 
and sample the available bandwidth to provide data in the form of a “filterbank”. 
These data are typically accumulated and written to disk for offline analyses. In the 
subsequent sections, it is assumed that the filterbank data have Nenan frequency 
channels, each of width A fchan. Assigning the first frequency channel to be centered 
at a frequency of f1, the frequency of the ith channel is therefore f; — A fehan(i— 1). 


2.1. Technique 


Inserting the physical constants into Eq. (3), we see, for a pulsar with dispersion 
measure DM, the effect of dispersion is to delay the zth channel relative to the first 


(siz) ~ Gate) 
MHz MHz 


With the data in this channelized form, correction for the dispersion delay in a 


one by an amount 


At = 4.15 x 10° ms (=) 
cm7” pc 


(5) 


brute-force fashion is done by computing the number of time samples to delay each 
channel by before summing all of the frequency channels. This process is shown in 
Fig. 1. The sum over all the channels produces a “dedispersed time series”, which 
may then be searched for periodicities or individual pulses or folded modulo a known 
pulse period to compute a pulse time-of-arrival for timing purposes. 

Even though the dispersion delay has been corrected for, it is evident from 
Fig. 1 that a small amount of uncorrected pulse broadening remains from the delay 
across individual frequency channels. To see this quantitatively, one can rearrange 


Raw Data Dedispersed Data 


Radio Frequency 
Radio Frequency 


Amplitude 
Amplitude 


Pulse phase Pulse phase 


Fig. 1. Pulse dispersion and the process of dedispersion. The effect of simply summing the pulse 
train over a finite bandwidth is to significantly broaden the observed pulse (left panel). Dividing 
the passband into smaller bandwidth channels and applying the appropriate delay to each channel 
considerably reduces the broadening and increases the pulse signal-to-noise ratio (right panel). 
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Eq. (5) for a bandwidth Af = fni — fio about a center frequency f = 4( fio + fri) 
to obtain the “dispersion broadening” relation (for f >> Af) as 


DM Af a ee 


For example, the first pulsar to be discovered,’ PSR B1919+21, has a relatively 
low DM of 12.4 cm~? pc. Over the 1 MHz bandwidth of the original Cambridge 
equipment observing at 81 MHz, the dispersion broadening is ~190 ms — quite 
unacceptable for millisecond pulsar hunters! Since this pulsar has a period of 1337 ms 
it was (fortunately) detected. 


2.2. Devices 


There are a number of approaches available to provide the channelized filterbank 
data as successive time dumps containing measurements of power as a function of 
frequency, P(f). The discussion is organized in roughly chronological order of each 
device’s development. 


2.2.1. Autocorrelation Spectrometers 


Shown schematically in Fig. 2, this device provides a flexible means of computing 
P(f) by computing the autocorrelation function by multiplying the incoming signal 
with a delayed version of itself. For a voltage, v(t), and its complex conjugate, v*(t), 
the autocorrelation function 


1 T 
R(r) = Jim = ‘| v(t)o*(t +7) ae. (7) 
From the Weiner—Khinchin theorem (see, e.g. Ref. 8), it follows that the power 
spectrum P(f) is the Fourier transform of R(r). Normally, the lagged products are 
stored on disk for subsequent off-line processing to produce P(f). 


Signal v(t) 
input >| Multiplier }———> Averager-——_~ R (At) 


1 >| Multiplier }——» Averager-—~ R (2A) 


y Y (further delay stages) i 


Fig. 2. Block diagram showing a single polarization channel of an autocorrelation spectrometer. 
This figure originally appeared in Handbook of Pulsar Astronomy.? 
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As described in detail elsewhere (see, e.g. Ref. 9), obtaining measurements of 
P(f) for each of two orthogonal polarization channels that are subsequently cross 
correlated provides a means to form all four Stokes’ parameters, J, Q, U and V. 
A necessary consideration in practice is to correct the correlation functions for 
the bias induced by digitization. Early correlators were simple two and three-level 
devices, which have a well-known bias in their measured correlation coefficients.1% 


Further discussion of these corrections can be found in Ref. 12. 


2.2.2. Analog Filterbank Devices 


Another simple device that has been widely used until relatively recently is the 
analog filterbank, in which the incoming band is split by a network of narrow- 
band filters. A one-bit sampling scheme is shown schematically in Fig. 3, in which 
a filtered signal passes through an RC circuit with a time constant of ~1 s. The 
running mean of this signal is formed by an integrator and subsequently compared 
to 0 V. The output is sampled as a 1 or 0 depending on whether the running mean 
is positive or negative. For further details on this device, see Ref. 9. 

These two- and three-level sampling schemes provide a crude but effective means 
of providing a compact data set for use in pulsar searches where weak sources 
are expected and the data are dominated by noise. The loss of sensitivity due to 
these coarse quantization schemes is up to 20% (see, e.g. Ref. 9). In spite of this 
limitation, the digitized signal is extremely robust to radio frequency interference. 
For example, a 10-Jy narrow-band pulse still only registers as a “1” in a single-bit 
digitization scheme. For these reasons, filterbanks and analog spectrometers were 
almost exclusively used for all pulsar discoveries carried out prior to 2010! Most 
notably, the Parkes multi-beam pulsar surveys, which used one-bit filterbanks, have 


Pulse generator 


(one pulse/sample) 


|| Comparator 
I 
High—pass filter | Resiter | — 


Integrator 


Fig. 3. Schematic showing a one-bit digitization scheme to sample filterbank data. This figure 
originally appeared in Handbook of Pulsar Astronomy.9 


186 D. R. Lorimer and M. Kramer 


been responsible for the discovery of about half of all currently known pulsars. For 
an observational review of this survey, see Ref. 13. 


2.2.3. Digital Filterbanks 


Modern pulsar searches are now carried out routinely using the properties of the 
discrete Fourier transform (DFT) to form the necessary spectral information. As 
described elsewhere (see, e.g. Ref. 14), given some sampled input signal z(n), where 
n is the sample number, which is modulated by a function exp(—2z7fn) and low- 
pass filtered by a device represented by the function h(n), then a DFT leads to the 
representation of the K frequency channels as 


Xz(m) = » h(mM — n)a(n)e 27/4) | (8) 


In this notation, i = /—I, n is the sample number and k = 0,1,..., K — 1. Digital 
filterbanks have been in existence since the late 1980s (see, e.g. Ref. 15). Currently 
used systems include the pdfb3 system at Parkes!® and GUPPI/PUPPI at Green 
Bank and Arecibo.!” An example block diagram showing the GUPPI architecture 
is shown in Fig. 4. 
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Fig. 4. Block diagram showing the Green Bank Ultimate Pulsar Processor (GUPPI) system 
(Figure source: Ref. 17). Incoming signals from each polarization channel are digitized in the 
IADC units before being sampled in the internet break-out board (IBOB) units. Fourier transforms 
to carry out the filterbanking are done in the FPGA chips on-board the BEE2. For coherently 
dedispersing of the data, additional processing is formed out to the Graphics Processing Unit 
(GPU) servers via the fast internet switch. 
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Fig. 5. The various steps involved in the polyphase filtering process. Here x(7) is a 1024-point 
time series and is multiplied with the window function w(i). The resulting product is split into 
four equal blocks before being summed and Fourier transformed (Figure source: Dale Gary). 


A drawback of the DFT approach is the leakage of the filtered signal into adja- 
cent frequency channels. A much sharper frequency response can be obtained via 
an additional processing step that is implemented by virtually all digital filterbank 
devices, including the ones mentioned above. The polyphase filterbank (PFB) tech- 
nique, shown schematically in Fig. 5, amounts to multiplying the incoming time 
series by a window function w(n) to form a new time series 


P-1 
y(n) = )_ a(n+jN)w(nt+ iN), (9) 


where each of the P filter coefficient represents a sub-filter. 


3. Coherent Dedispersion 


The technique of coherent dedispersion was originally proposed by Hankins and 
Rickett.‘8 This approach starts from the measured complex voltages u(t) and cor- 
rects these for dispersion to form the intrinsic voltage from the pulsar, vint(t). It is 
a more data intensive process, which limited its use for most pulsar observations in 
favor of incoherent dedispersion discussed above. As already noted, Eq. (6) shows 
that the limiting time resolution achievable from incoherent dedispersion is due 
to the necessarily finite frequency channel bandwidth, Af. This limit is exacer- 
bated at higher DMs and/or lower observing frequencies. While this is often not 
too significant for long-period pulsars, with the discovery of millisecond pulsars!? 
it was quickly realized that the broadening limits of incoherent dedispersion can 
significantly impact any measurements of small-scale features in pulse profiles. 
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PSR B1937+21, 1500 MHz, GBT/GUPPI 


T T T 


T 
Incoherent 


Coherent 


Flux 


1 L 1 1 
0 0.2 0.4 0.6 0.8 1 
Pulse phase (turns) 


Fig. 6. Example of coherent and incoherent dedispersion showing data from the millisecond pulsar 
B1937+21. The coherently dedispersed pulse profile faithfully reproduces the true pulse shape 
(which contains the “notch” in the main pulse) compared to the incoherent profile which is limited 
by dispersion broadening. Figure taken from Ref. 17. (See electronic edition for a color version of 
this figure.) 


An example of the improved profile fidelity provided by coherent dedispersion 
is shown in Fig. 6. Such measurements are necessary not only from an emission 
physics perspective, but also to maximize the precision and accuracy possible in 
pulsar timing observations. Even for non-millisecond pulsars, which exhibit very 
small time-scale features in their emission, it is often desirable to be able to observe 
with the highest time resolution possible. An excellent example of this is the Crab 
pulsar, B0531+21, from which nanosecond time-scale pulses have been observed 
(see, e.g. Ref. 20). 


3.1. Technique 


A brief description of the technique is given here.* The dispersion removal process 
is best described in the Fourier domain, where the raw voltages u(t) and vint(t) 
transform to V(f) and Vint(f) for time ¢ and observing frequency f. It is convenient 
to represent the action of the ISM in terms of a transfer function H that relates the 
observed and intrinsic quantities as: 


V(fo+ f) = Vint(fo + f)H(fo + f). (10) 


*Further details can be found in Refs. 9 and 18. 
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Considering the dispersion delay in the time domain as a phase rotation in the 
frequency domain, then the transfer function becomes 


H(fo + f) =e Ut, (11) 


where k is the wavenumber and D is the distance. For the cold plasma dispersion 
model introduced in Sec. 1, it can be shown (see, e.g. Ref. 9) that 


Tw 2n6 
Ke +) 2+ fi —], (12) 


When inserted into Eq. (11), and considering only terms quadratic in f, this leads 
to the result 


ie?DM f? ) 


(fot f) = exp (oe 


(13) 


The bandwidth-limited complex voltage, v(t), can be written as a combination 
of its (real) amplitude, a(t), a time-varying phase term, ¢(t), and the carrier wave 
centered at fo, 


u(t) = a(de?O er iot, (14) 
The signal is mixed with that of a local oscillator (LO) of frequency fio to produce 


a signal Z. A second signal Q is produced that uses the same LO but with a phase 
shift of 90° (or 7/2). 


T(t) = pa(tet® {ele o+ tuo! +4 e'2n(fo— fro )ty (15) 
1 . : 
Q(t) = tet te warmer = el? (fo-fuo)ty (16) 


Applying a low-pass filter so that only frequencies with f < Af/2 may propagate 
removes the frequency parts corresponding to fo + fro. As a result of this process, 
one finds that 


Boe i ote 
Dip a airelt lente to —a(tjetenvo-feo)e tel), (17) 


Q(t) = —a(t)e'?() ein (fo— fuo)t = pal tet en for feo to), (18) 


Choosing the LO frequency such that it is in the center of our band (i.e. fuo = fo), 
the bandpass [fo — Af/2; fo + Af/2] is shifted to the baseband frequency range 
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[-Af/2;+Af/2]. For the digitized real parts of the signals, 
1 
I(t) = Re(Z(4)) = 5.a(t) cos(4()), (19) 


Q(t) = Re(Q(t)) = ~Falt) sin( (6), (20) 


giving access to both the amplitude and phase of the complex voltage u(t). The 
signals I(t) and Q(t) are the real and imaginary parts of v(t). This form of baseband 
mixing with fto = fo provides complex sampled data. Another way of viewing it 
is to consider I(f) and Q(f) as providing both positive and negative frequencies of 
the bandpass relative to fo in the Fourier domain, [—Af/2; +A f /2]. 


3.2. Devices 


Devices that implement the above process have evolved significantly over the years. 
Early devices recorded the baseband signals and carried out the processing offline. 
However, as data processing speeds have increased, it has become possible to do 
much of the processing in real time. 


3.2.1. Swept Local Oscillator Systems 


The difficulties in coherently dedispersing large bandwidths in the 1970s and 1980s 
using the above approach inspired a hybrid technique using a sweeping LO.?! 2° In 
this scheme, shown in Fig. 7, the LO is driven at a rate v that follows the dispersion 
relationship. The action of this, for a pulse of width dt, is to convert the time 
structure contained in the pulse into a spectral line of width dv = vét. 


FREQUENCY (MHz) 


> 
Dy 
428.31 <> 


TIME (PERIODS) 


Fig. 7. Schematic showing dedispersion using a swept LO. The dispersed pulses are shown as 
shaded bands, while the LO frequency is shown by the dashed line. The ét and dv markings show 
the pulse in the time and frequency axes. This figure originally appeared in Ref. 22. 
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The resulting narrow-band line is then sampled by some form of incoherent 
filterbank (e.g. an autocorrelation spectrometer). The resulting sampled pulse has 
a limiting resolution of the incoherent dispersion smearing limit divided by the 
number of channels in the filterbank. 


3.2.2. Baseband Recorders 


Although the above swept local oscillator approach worked well (see, e.g. Ref. 24), 
practical difficulties in driving the local oscillator at the correct rate meant that 
the technique was not widely used. During the 1990s, with the advent of high- 
speed computing power, it became possible to sample successively larger receiver 
bands using baseband recorders in which the raw voltages were written to tape or 
disk for offline processing. Significant efforts were made in developing such systems 
by groups at Swinburne/CalTech,?° Berkeley?® and Princeton.?” These recorders 
essentially amounted to a down-converter, a low-pass filter and a fast digitizer. An 
example block diagram from the Princeton system is shown in Fig. 8. 


3.2.3. CPU/GPU Cluster-Based Systems 


The baseband recorders were very successful, albeit at the cost of being somewhat 
data intensive with little real-time feedback. They were used extensively for about 


LCP 30 MHzIF RCP 30 MHz IF 


y y 
30 MHz LO Baseband Mixers and Attenuators 
L Cos L Sin 
td y 7 
R Cos R Sin 
20 MHz Clock Filters, Amplifiers, 10 MHz BW/ 2 bits 
6 PPM tick —-— A/D Convertors oF 
Celsennie and Packing 5 MHz BW/ 4 bits 
y 10MB/s 
SPARC-20 
Workstation 
Throughput to tape 


and/or disk: 10 MB/s 


100 GB DLT 7000 DLT 7000 


Disk Array 35 GB 35 GB 


Fig. 8. Schematic diagram showing the Princeton “MkIV” baseband recording system. The MkIV 
sampled a 10-MHz receiver band and was used extensively at the Arecibo observatory in the 1990s 
and early 2000s. This figure originally appeared in Ref. 27. 
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a decade and resulted in a wealth of important results, particularly in the area of 
high-precision timing.?® °° The demand from users to observe with ever-increasing 
bandwidths and get rapid turnaround times in processed data in the early 2000s saw 
a move towards CPU-based Beowulf clusters connected via fast Ethernet switches to 
disk arrays. The ASP, GASP and BON systems at Arecibo Green Bank and Nangay, 
respectively,?! provided routine operations with bandwidths up to 128 MHz. At 
Parkes, a similar revolution was enabled using the CPSR2 system.*? These systems, 
and similar ones elsewhere, formed the backbone of early attempts to carry out 
routine observations of millisecond pulsars as part of a timing array. 

With the advent of GPU-based machines, as well as the availability of Field 
Programmable Gate Arrays (FPGAs), a new coherent dedispersion paradigm has 
been established in the past decade. Modern devices such as the Green Bank Ulti- 
mate Pulsar Processor (GUPPI, and its counterpart in Arecibo — PUPPI) can now 
handle up to 1 GHz bandwidths by using FPGAs to form coherently dedispersed 
sub-bands, which are then combined by a cluster of GPUs. An example of the 
benefits of wide-band observing is shown in Fig. 9. Not only has the bandwidth 
exponentiated in the past 25 years, but the bit rate has increased as well. GUPPI 
and PUPPI, and their counterparts at Parkes (CASPSR), Westerbork (PUMA2), 
Jodrell Bank and Effelsberg (ROACH) now routinely digitize the available band 
with 8-bit precision. This high level of fidelity provides improved measurements of 
the pulse profiles (which are not limited by quantization effects; see, e.g. Ref. 33), 
as well as robustness to radio frequency interference. 


4. Cyclic Spectroscopy 


In the past five years, a number of authors have been investigating the use of a 
different approach to pulsar data acquisition. The technique of cyclic spectroscopy 
(see, e.g. Ref. 34) has been developed to study the stochastic properties of periodic 
signals. One of a number of potential advantages this technique has over the inco- 
herent and coherent dispersion removal approaches discussed so far is that it offers 
a way of removing the effects of interstellar scattering from a pulsar signal. 


4.1. Technique 


A complete description of cyclic spectroscopy for pulsar data applications can be 
found in Ref. 35. The essential ideas are summarized below. Starting from an average 
of the incoming voltage, v(t), and its complex conjugate, v*(t), an average over some 
time interval T leads to the correlation function 


T 
C(t, 7) = zf u(t + 7/2)v* (t — 7/2)dt, (21) 
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Fig. 9. Integrated pulse profile of the millisecond pulsar J1713+0747 taken using the Arecibo 
telescope and the PUPPI backend. This observation highlights the vast superiority of the 1 GHz 
bandwidth available to PUPPI, which includes a bright “scintle” that would not have been captured 
by its narrow-band predecessor, ASP. This figure originally appeared in Ref. 17. 


where 7 represents a time lag. The signal is said to be “cyclostationary” if C(t,7) = 
C(t + P,r), where P is the pulse period. The cyclic spectrum is 


lee) P 
S(f,a) = of | C(t, 7) exp[—277( fr + at)| dt dr (22) 
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where the signal frequency, f, is the Fourier pair of 7, and the cycle frequency, 
a = n/P, is the Fourier pair of t, where the harmonic number n = 1,2,3,.... 
As shown in Ref. 35, when applied to the case of the pulsar signal, which can be 
represented in the time domain as the convolution of amplitude-modulated noise 
with an impulse response function that represents the effects of both scintillation 
and scattering in the ISM, the cyclic spectrum takes the form 


S(Jva) x Hisw (f+ 5) Hise (F— $) 100). (23) 


Here Higm and I(n) are, respectively, the Fourier transforms of the ISM impulse 
response function and the intrinsic pulse profile. Note that the AHyjg)4 function here 
includes the effects of both dispersion and scattering, as opposed to the dispersive 
only function in Eq. (13). Provided that the cyclic spectra are computed on a 
timescale shorter than that of diffractive scintillation, the above approach is a valid 
approximation of the signal. As described in Ref. 36, a measurement of S(f,a) 
contains sufficient information to independently determine both Hygm and I(n), 
allowing the recovery of the intrinsic pulse profile. An example of this process is 
shown in Fig. 10. 


4.2. Devices 


Implementations of cyclic spectroscopy have been made only relatively recently. 
As a result, they have taken advantage of the significant processing power available 
from FPGA/GPU architecture currently in use for coherent dedispersion. Real-time 
cyclic spectroscopy processing has recently been implemented into the existing hard- 
ware architecture at Arecibo and Green Bank (G. Jones, private communication). 


5. Current and Future Prospects 


A significant amount of well-tested code now exists to carry out pulsar signal pro- 
cessing. In particular, the dspsr package®’ provides a modular approach to coher- 
ent dedispersion, digital filterbanking and cyclic spectroscopy techniques discussed 
here. A key development in this field in the past decade has been the Collabora- 
tion for Astronomy Signal Processing and Electronics Research (CASPER) group 
established by Don Backer and Dan Werthimer at Berkeley. Almost all of the 
current FPGA/GPU-based devices in use make use of the open-source tools pro- 
vided by CASPER, which also runs regular training workshops. Currently active 
projects involving CASPER. include a wide-band recording system on the Very 
Large Array,°° the Large European Array for Pulsars®? and the Versatile GBT 
Agile Spectrometer.*° 

While it is fair to say that current pulsar data acquisition systems are close to 
optimal for existing telescopes, radio astronomy is currently undergoing a transfor- 
mation that presents a new challenge to pulsar observations and data acquisition 
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Fig. 10. (a) Impulse response function for B1937+21 (top) and pulse profiles at 430 MHz obtained 


from a cyclic spectroscopy analysis. The dashed curve is the observed profile and the solid curve 
shows the scattering-corrected profile. (b) Spectrum of B1937+21 obtained via cyclic spectroscopy. 
(c) The same observation processed from a coherent filterbank. The black rectangles show unit 
time-frequency area. This figure originally appeared in Ref. 35. 


systems. The advent of large-aperture arrays in the form of LOFAR,*! ASKAP*? 
and MeerK AT,*? as well as the Five Hundred Metre Aperture Spherical Telescope,“4 
which in some form can be considered as pathfinder instruments to the Square Kilo- 
metre Array,*° will result in rapid increases in the number of independent beams 
on the sky. As shown in Fig. 11, when compared to predictions of data writing 
capabilities, the output from the telescope is expected to ultimately prohibit off 
line data analysis. Pulsar surveys of the future must be developed so that they can 
be carried out in real time. 
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Fig. 11. Scaling of pulsar data acquisition requirements into the future. The approximate data 
writing limit is arrived at by extrapolating current trends. Figure provided by Scott Ransom. 
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Chapter 1 


High Performance Silicon and 
III-Nitride-Based UV and UV/Optical 
Imaging Detectors 


Shouleh Nikzad, Michael E. Hoenk, J. J. Hennessy and L. D. Bell 


Jet Propulsion Laboratory, California Institute of Technology, 
Pasadena, CA 91109, USA 


The motivation for astronomical observation in ultraviolet (UV), background in 
UV and UV/optical detectors, and the need for technology advancement beyond 
Hubble Space Telescope and Galaxy Evolution Explorer (GALEX) are discussed. 
A detailed description of band-structure engineering, atomic-precision modifica- 
tion of surface and interfaces using epitaxial techniques and atomic layer deposi- 
tion, and their effect on device performance in the UV/optical spectral range, are 
presented. Specific results of application of 2D-doping in silicon devices, surface 
engineering of III-nitride photocathodes (PCs) and detectors, as well as formation 
of out of band blocking filters integrated into detectors are also discussed. 


1. Introduction 


Technology innovation and discoveries in astrophysics and cosmology have pro- 
gressed hand in hand since the time of Galileo. Galileo’s use of a new telescope led to 
profound scientific and philosophical conclusions, motivating new telescope designs 
which in turn led to more observations and discoveries. This cycle of technology 
and science discovery is continuing to this day, and as the astronomy community 
prepares and plans for the next decade’s ground and space observatories, critical 
detector technologies are evaluated and development roadmaps are created. Science 
and technology definition teams (STDTs) along with their corresponding Design 
Teams for four mission concepts — selected and funded by NASA — prepared and 
submitted their final reports in 2019 as inputs to the National Academy’s National 
Research Council (NRC)’s Decadal Survey, which sets the priorities for science and 
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observatories and specifies the required technologies for the next decade. Of the 
four NASA-funded flagship mission concept studies, two are focused on the spec- 
tral range from near infrared (NIR) through ultraviolet (UV). These are the Large 
Ultraviolet Optical Infrared survey mission (LUVOIR) and the Habitable Exoplanet 
(HabEx) characterization mission. Superb measurements and observational capabil- 
ities will be possible with these large-aperture (4-15 m) space telescopes. Because of 
the objectives of discovering and characterizing exo-solar planets in these mission 
concepts, as well as in their earlier counterpart High Definition Space Telescope 
(HDST) concept study,’ occultation strategies either internal (coronagraph)? or 
external (star-shade)? would be a necessary part of the mission architecture. The 
occultation strategies also rely on imaging or spectroscopy instruments, which fur- 
ther place challenging requirements on detectors. 

LUVOIR and HabEx require large format, small pixels, linearity, and (in the 
case of some instruments) photon counting capabilities in silicon detectors (for 
UV-NIR up to lym) and HgCdTe detectors (for >1jm). This chapter focuses 
on the short wavelength range and on silicon detectors that have been optimized 
for UV or UV/optical detection. In addition to silicon detectors, we will also discuss 
gallium nitride (GaN) photocathodes as an enabling component of high efficiency 
and stable image-tube UV detectors such as microchannel plates (MCPs) and elec- 
tron bombarded arrays. We will also briefly discuss GaN-based solid-state detector 
arrays. 

In addition to the flagship mission concepts mentioned above, it is generally 
expected that in order to produce major new scientific results, UV astrophysics 
missions — of all classes during the next two decades — will require significant 
detector advances, particularly in quantum efficiency (QE), photon counting capa- 
bility, dynamic range, resolution, and pixel count. 

For many applications, a faint UV signal must be detected in the presence of 
large unwanted visible light background. The ratio of in-band UV to out-of-band 
longer wavelengths is often small, and even a small fraction of long-wavelength 
scatter (due to non-ideal filters or grating imperfections, for instance) can be prob- 
lematic. Wide-bandgap photocathodes coupled with MCPs, electron-bombarded 
charge-coupled devices (EBCCDs), or electron-bombarded complementary metal 
oxide semiconductor (EBCMOS) detectors are used in many UV instruments. The 
wide bandgaps of these photocathodes, together with electron extraction across a 
vacuum gap, provide intrinsic solar blindness, low noise, and UV sensitivity. 

More recently, as will be discussed in this chapter, high efficiency UV silicon 
detectors have been developed with tailorable response and out-of-band (visible) 
rejection capability by incorporating metal-dielectric filters directly deposited on 
the back surface of the silicon detector. 

UV photons interact with only the first few nanometers of the detector material, 
directly below the optically active surface. This is true for both solid-state detectors 
and image-tube detectors that rely on a photocathode. It is therefore critical to 
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precisely control the surfaces and interfaces of devices. The emphasis of this chapter 
is to focus on detector technologies that achieve these performance goals by exquisite 
control of surfaces and interfaces, provided by epitaxial techniques for passivation 
and band structure engineering. 


2. Background on UV Detectors 


Current UV detection technology can be classified into two major categories: 


(1) Image tube technologies that combine a photoemissive device (e.g., photocath- 
ode), a gain component (e.g., an MCP), and an electron detector. 

(2) Solid-state devices based on silicon or other (particularly wide bandgap) 
semiconductors. These require analogous components to those of tube-based 
technologies: an absorbing region, a gain component, and electron readout 
electronics. 


2.1. Image tube UV detectors 


Before the invention of solid-state imaging detectors, NASA launched a series of 
unmanned spacecraft designed to take close-up pictures of the moon’s surface in 
preparation for the moon landing. Film was not an option, because the film could 
not be returned to earth. The cameras onboard Ranger used vidicons, the only 
electronic imaging technology then available. Originally developed for television, 
vidicons convert light first into a beam of electrons and then into an electronic signal 
suitable for radio transmission. Vidicons are comprised of an evacuated tube with 
a photoconductor at one end for conversion of incident light into an image stored 
as a pattern of electrons in the photoconductor, and electron optics for periodically 
scanning and erasing the stored image with an electron beam. In the 1960s, vidicons 
were significantly improved by using a back-illuminated silicon photodiode array 
as the photoconductor.* Massive and fragile, with poor light sensitivity, nonlinear 
response, and poor image quality, vidicons were hardly ideal imaging detectors for 
spaceflight, let alone for scientific imaging. Nevertheless, they were routinely used 
for imaging cameras on NASA missions throughout the 1960s and 1970s, such as 
Mariner and Voyager. 

Over the next 50+ years, vidicons evolved into modern-day MCP detectors, 
with greatly improved resolution, sensitivity and noise performance. Currently, 
photocathodes coupled with MCPs and silicon detectors are used extensively in 
UV instruments in space. MCP detectors utilize a photocathode (PC) to convert 
photons into electrons that are ejected into vacuum, and the electron signals are 
collected and amplified by the MCP. The MCP is made from an array of tubes that 
are on the order of ten micrometers in diameter. Electrons entering these tubes 
are accelerated by high electric fields and they free electrons from the tube walls 
in each collision, creating an electron multiplier effect. The MCP provides electron 
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signal gain while the tube structure confines the electrons and preserves the spatial 
information. The electron bursts generated by the MCP are read out by various 
schemes such as multi-anode arrays.* 

Another scheme for readout consists of a phosphor for conversion of electrons 
to photons and a silicon imager (charge-coupled device (CCD) or complemen- 
tary metal-oxide semiconductor (CMOS) detector), sometimes coupled through 
fiber optic faceplates. The latter readout scheme is often referred to as an image 
intensifier. 

Silicon imaging arrays are also often used in this type of photoemissive detec- 
tor, including EBCCDs and, more recently, EBCMOS detectors. By lowering the 
threshold for electron emission, cesiation (adsorption of Cs) greatly improves the 
sensitivity of photocathodes, enabling the detection of faint ultraviolet signals in 
the presence of visible and infrared background light. Because of their low dark 
current and high gain, photocathode-based detectors can be used for photon- 
counting applications without requiring detectors to be cooled. This combination 
of attributes has significant advantages for UV detection in space, and MCP-based 
detectors have flown on multiple instruments and enabled a wide range of scientific 
discoveries.* ® 

While MCP-based detectors have been used successfully in many NASA mis- 
sions, they have significant room for improvement due to lower QE, limited resolu- 
tion, limited number of pixels, high-voltage requirements, and difficulty of fabrica- 
tion in sealed-tube configuration. Resolution appears to be limited to 25—40 microns 
by fundamental charge cloud variance in MCPs, which cannot be predicted a pri- 
ori. Cesiated photocathodes are difficult to manufacture and prone to degradation, 
leading to practical limitations on the size, cost and durability of large-scale focal 
planes. The dual requirements for high voltage and sealed-tube fabrication limit the 
mass, power and robustness of MCP-based detectors, and contribute to problems 
with long-term reliability in a space environment. High voltage stability and arcing 
is a concern, especially with proximity-focused PCs. These problems get worse as 
MCP-based focal planes are scaled in size. We will discuss later the promise of GaN 
PCs to overcome some of these problems, but demands for higher spatial resolution 
and sensitivity have contributed to a growing need for high performance, solid-state 
imaging detectors. Replacing bulky and fragile vacuum tubes with wafer-thin silicon 
dies promises to be a giant leap for scientific imaging from space. 


2.2. Solid-state UV detectors 


This category encompasses specially processed silicon detectors as well as detectors 
in wide bandgap materials such as silicon carbide, the GaN family, diamond, and 
some newer materials such as zinc oxide. For the wide-bandgap materials, detec- 
tors are typically made in a hybrid structure, in which an array of photodiodes is 


*See also Chapter 21 of Volume 3 of this Handbook for further discussion of MCPs. 
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mechanically and electrically connected to a silicon CMOS readout.'? Monolithic 
imagers with transistors fabricated in the host or substrate wide bandgap material 
exist in concept or in early stages of development.!! These materials offer intrinsic 
solar-blind UV imagers. In this chapter, we focus on GaN and its alloys as PCs 
because of their more near-term applicability. We specifically focus on PC activa- 
tion without cesiation, using near-surface band-structure engineering. We will later 
briefly discuss all-solid-state detectors made in the IIJ-N material family. 

The solid-state UV detector category also includes back-illuminated and UV- 
optimized silicon detectors. With proper passivation such as 2D doping (delta- 
doping and superlattice doping) that will be discussed in Sections 3 and 4, very high 
UV efficiency and stable response can be achieved. With the advent of CCDs and, 
more recently, CMOS imaging sensor technologies and their ubiquitous presence 
in the consumer market, the semiconductor industry has made enormous invest- 
ment in silicon visible imagers. Scientific visible detectors have been developed with 
very low noise, low dark current, and very large formats (e.g., 10k x 10k pixels). 
Silicon sensors continue to improve, with electron-multiplying CCDs (EMCCDs), 
single photon avalanche photodetector arrays (SPADs), and quanta image sensors 
(QIS) enabling the use of silicon detectors for photon counting applications. More 
recently, incorporation of solar blind filters directly on back-illuminated 2D-doped 
silicon arrays has achieved high in-band QE and high rejection ratio (3-4 orders of 
magnitude). 


3. Silicon UV Detectors 


3.1. A revolution in imaging 


George Smith, co-inventor of the charge-coupled device, wrote in 2009 in his accep- 
tance of the Nobel Prize for Physics, “...CCDs were born in the Si-SiO2 revolu- 
tion and created their own revolution in widespread imaging device applications” .!? 
Smith’s coupling of revolutions in technology and science is particularly appropriate 
to astronomy. Around 400 years after Galileo changed the world by developing the 
world’s first astronomical telescope and using it to discover moons orbiting Jupiter 
and craters on the lunar surface, astronomers are building a new generation of space 
telescopes to explore the universe with greater depth, precision, and coverage than 
ever before possible. Astronomical focal planes populated with solid-state detectors 
have thus greatly expanded our horizons, from the discovery of distant galaxies using 
four CCDs in Hubble Space Telescope’s Wide Field/Planetary Camera 2 (Fig. 1), to 
exoplanets emerging from Kepler’s 42 CCD focal plane, and the all-sky astronomical 
survey currently being undertaken by Gaia’s 106 CCD focal plane comprising nearly 
a billion pixels and 0.4m? of silicon. 

This evolution of astronomical instruments followed innovations in detector 
materials, fabrication, design and architecture that increased spatial resolution, 
improved photometric sensitivity and stability, and expanded the spectral range, 
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(a) 


Fig. 1. Two iconic images acquired by the Wide Field Planetary Camera 2 on the Hubble 
Space Telescope. (a) The Pillars of Creation, which are star-forming regions in the Eagle Neb- 
ula (photo credit: Jeff Hester/Paul Scowen/NASA/ESA; https://science.nasa.gov/science-news/ 
science-at-nasa/2015/07jan_pillarsofcreation). (b) The Hubble Deep Field, in which are visible 
hundreds of never before seen galaxies (photo credit: Robert Williams and the Hubble Deep Field 
Team [STScI]/NASA/ESA; https: //www.spacetelescope.org/images/opo9601c1/). 


enabling their use in X-ray.’ UV, optical, and infrared imaging and spectroscopy. At 
the heart of these telescopes are solid-state imaging detectors, which transform light 
into digital images with exceptional sensitivity, resolution, and dynamic range. The 
transformation of astronomy by detector technologies was recognized with the 2009 
Nobel prize in physics. The broader transformation of society by digital imaging 
detectors was recognized by the 2017 Queen Elizabeth Prize for Engineering. 


3.2. CCD, CMOS, APD, SPAD, QIS 


In a presentation to the American Rocket Society in 1961, Eugene F. Lally of the 
Jet Propulsion Laboratory first proposed solid-state imaging arrays.!? The idea 
was to use arrays of photodetectors in a guidance system for interplanetary travel. 
The technology required for this system did not yet exist, but the semiconductor 
revolution of the 1950s and 1960s was laying the groundwork for silicon imaging 
detectors. AT&T Bell Labs was developing a Picturephone that would transmit 
video images over the phone line; their vidicons used silicon photodiode arrays,* 
but were still limited by the requirement for an electron beam in vidicon readout 
electronics. Peter Noble’s invention of “self-scanned” silicon diode arrays'* took 
Lally’s vision a step closer to reality. The key development came the following year, 


bSee Chapters 8-10 of Volume 4 of this Handbook. 
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when Smith, Boyle and Tompsett invented the CCD as an electronic alternative to 
magnetic bubble memory.!* !6 

Aware of AT&T’s work on electronic imaging, Tompsett, Boyle and Smith 
quickly realized that CCDs could be adapted for this application, using the charge 
storage and transfer capabilities of their charge bubble devices to achieve the collec- 
tion and measurement of photogenerated charge to form an image.!? CCDs initially 
suffered from very poor image quality, which was traced to the capture and re- 
release of stored charge by surface states located at the Si-SiO2 interface of the 
MOS capacitors. The problem was addressed in 1970 when AT&T demonstrated 
the buried-channel CCD, which modified the original CCD design by incorporating 
a shallow ion implant at the front surface.!? Doping the surface modifies the near- 
surface band-bending to create a buried channel that is spatially isolated from the 
surface traps. This modification improved the charge transfer efficiency by several 
orders of magnitude, but did not eliminate all problems caused by surface states in 
CCDs. An essential technology in all of these innovations is back-illumination, which 
entails advantages and challenges that are explored in Section 3.3. Surface states 
also affect the performance of back-illuminated devices, and attempts to mitigate 
their effects have played a key role in the history of scientific CCDs. 

The two major types of detectors on which the modern digital imaging industry 
is based are CCDs and CMOS imaging detectors (Fig. 2). While CMOS imaging 
detectors have lagged behind CCDs in astronomy due to the greater challenges in 
fabrication and performance, technologies are advancing rapidly and CMOS imaging 
detectors are currently flying in space, including a recent deployment on NASA’s 
Orbiting Carbon Observatory-3. Silicon will continue to play a central role in the 
solid-state detector technologies, both as monolithic integrated detectors such as 


Thinned CMOS Array 


Fig. 2. A thinned CMOS imaging array, with the front surface facing up, and the back surface 
shown in reflection. Only the central portion of the array is thinned, leaving a thick silicon frame 
for mechanical support. 
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CCDs and CMOS imaging detectors, and as hybrid detectors comprising a variety 
of materials for UV and infrared detectors. Vertical integration of detector and 
readout electronics has greatly expanded the range and power of solid-state detec- 
tors. New silicon detector technologies are being developed and deployed to enable 
astronomical imaging and spectroscopy with single photon sensitivity, including 
EMCCDs and single-photon avalanche diode arrays (SPADs).'® The recently devel- 
oped Quanta ImageSensor (QIS)!® foretells a new paradigm in digital imaging, with 
unprecedented capabilities for gigapixel resolution, single photon counting sensitiv- 
ity and time-resolved detection at room temperature and without gain. To achieve 
such precision is no small task. Light incident on the solid-state imaging array 
is converted into charge, which is collected over the entire array and measured 
with sensitivity that can approach the level of single electrons, and yet maintain 
exceptional photometric precision over a dynamic range covering four to five orders 
of magnitude. At this level of precision, defects in the detector material become 
extremely important. From a materials science perspective, many of the challenges 
currently faced by astronomers come down to controlling the surfaces and inter- 
faces comprising the solid-state imager, and many of these were first encountered 
with CCDs. 


3.3. Back-illumination and passivation 


In 1973, only 3 years after invention of the CCD was first announced, Texas Instru- 
ments developed the first back-illuminated CCD imager. Solid-state imaging devices 
generally consist of a semiconductor material layered with insulators and metals. 
The semiconductor serves as an absorber of photons, transforming light into pat- 
terns of electrons and holes corresponding to the image projected onto the detector 
surface. In order to read the image, circuits are formed on one side of the detector, 
dividing the surface into pixels and providing the means to measure the electrons 
or holes captured in each. Whereas these circuits are essential for reading images, 
they create wavelength-dependent fluctuations in images associated with scatter- 
ing, absorption and interference, degrading image quality and imposing limits on 
resolution, sensitivity, and spectral range. In particular, front-illuminated detectors 
are blind in the UV, as high-energy photons interact so strongly with the circuits 
that no light can reach the detector. 

Adapting methods previously developed for silicon photodiode vidicons, 
researchers at Texas Instruments turned CCDs upside down in order to illuminate 
the device through the back surface. Back-illuminated CCDs achieved quantum 
efficiencies as high as 95% at visible wavelengths, a factor of three higher than 
comparable front-illuminated imagers.?? 

The motivation for developing a back-illumination process is apparent in Fig. 3. 
In a front-illuminated configuration, incident light is subjected to absorption and 
scattering by the oxide, polysilicon, and aluminum layers that make up the elec- 
tronics used for charge storage and manipulation. In contrast, the back-illuminated 
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Fig. 3. Schematic drawing of the electronic band structure in a back-illuminated silicon imaging 
detector. The detector is depicted in cross-section, with the back surface on the left and the front 
surface on the right. The electric potential seen by an electron is indicted by the thin black curve. 
The front surface electrodes, consisting of multiple layers of oxide, polysilicon, and metal, are shown 
schematically on the right, and an oxide formed on the back surface is shown schematically on the 
left. Light enters the detector from the back surface, generating free electrons in the conduction 
band. In order to be detected, the electrons must move by drift and diffusion into the buried 
channel near the front surface. However, positive charge at the back surface creates a backside 
potential well, which can trap some of the photogenerated electrons. 


configuration inverts the imaging detector so that the illuminated surface is the 
backside silicon surface, with no intervening layers. 

Typical CCD and CMOS imaging arrays have an active volume that is only 
5-15 microns thick; the remaining several hundred microns of highly-doped silicon 
does not interact with incoming light. For back-illumination, thinning can be used 
to remove this substrate material, but the thinning process creates an active surface 
where none existed before (Figs. 2 and 3). This new surface impinges directly on the 
detector volume, creating a variety of new problems that must be solved in order 
to meet the requirements for scientific imaging. Notable among these problems is 
quantum efficiency hysteresis, described in the next section. 

Surface states are electronic states that are confined to surfaces (and by exten- 
sion, interfaces). Dangling bonds and surface defects introduce electronic states 
within the semiconductor bandgap, promoting thermal dark current or acting as 
traps capable of the capture and delayed release of photogenerated charge. Charge 
trapped at the silicon/oxide interface and in the overlying oxide creates an electric 
field that extends into the silicon detector. In p-type silicon, a positively charged 
surface creates downward band-bending that biases the near-surface region into 
depletion, creating a backside potential well that traps photogenerated charge at 
the surface (Fig. 3). 


3.4. Astronomical CCDs and the Hubble Space Telescope 


Quantum efficiency hysteresis, or QEH, is the bane of scientific imaging. Referring 
back to Fig. 3, the trapping of photogenerated charge at the surface of a back- 
illuminated CCD causes dynamic changes in the surface charge that depend on the 
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illumination history of the surface. Changes in the surface charge cause the deple- 
tion depth to change as well, and as a result the effective quantum efficiency of the 
CCD becomes unstable and hysteretic. In the decades following Texas Instruments’ 
first demonstration of a back-illuminated CCD, a variety of processes have been 
developed to overcome the problems created by surface and interface states at the 
back surface. These processes share a common goal — bias the back surface into 
accumulation to prevent trapping of photogenerated charge and stabilize the quan- 
tum efficiency. The history of CCD development for the Hubble Space Telescope 
(HST) illustrates the difficulty of this undertaking. 

The HST was conceived at NASA before there was a detector that could meet 
its goals. Technology selection and detector development for the mission took more 
than a decade, requiring considerable effort. The spectacular successes of HST, still 
ongoing, are due to and validate the efforts of the early creators and developers of 
scientific CCD technology. An intensive period of development at JPL and elsewhere 
led to NASA’s selection of JPL to develop CCDs for Hubble.?! In 1972, JPL initiated 
an advanced technology program to develop CCDs for planetary missions. In 1975, 
JPL successfully demonstrated a 100 x 160 back-illuminated Texas Instruments 
(TI) CCD, and announced plans to develop a 400 x 400 CCD for planetary missions. 
The next year they announced the development of an 800 x 800 TI CCD for a Jupiter 
Polar Orbiter mission, which ultimately evolved into Galileo. In 1977, JPL was 
selected by NASA to develop the Wide Field and Planetary Camera (WF/PC), 
a key instrument for the HST. The WF/PC focal plane consisted of eight back- 
illuminated, 800 x 800 Texas Instruments CCDs. 

Three generations of WF/PC instruments have flown on Hubble. HST was 
launched on the space shuttle in 1990 with WF/PC 1 on board. During a servicing 
mission in 1993, WF/PC 1 was replaced by WF/PC 2. Many of Hubble’s most 
spectacular discoveries came from images collected by WF/PC 2. WF/PC 2 flew 
on HST until 2009, when it was replaced by the Wide Field Camera 3 (WFC 3). 
Throughout this period, CCDs destined for HST evolved considerably, driven in 
large part by efforts to eliminate QEH. 


3.5. Quantum efficiency hysteresis on the Hubble Space Telescope 


HST’s science goals required the WF/PC CCDs to be sensitive throughout the 
visible and ultraviolet spectrum, down to the extreme UV and including the scien- 
tifically important Lyman-alpha line of hydrogen at 121.6nm. Thinning and back- 
illumination alone were insufficient to meet the requirements in the UV, because the 
backside potential well prevented sensitivity in the deep UV. This is a more severe 
issue at these shorter wavelengths because of the smaller penetration depth of UV 
light; photoelectrons created in the potential well at the surface will be trapped 
there. In order to meet the science goals, the back surfaces of thinned WF/PC 
1 CCDs were coated with a 120-nm coronene film to enhance the UV sensitivity. 
Coronene absorbs UV light at wavelengths less than 380nm and re-emits green 
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light with a luminescence peak near 500nm, resulting in a quantum efficiency for 
WF/PC 1 CCDs that varied from 5% to 15% across the UV spectrum.?* 73 

Although delayed until 1990, HST was initially scheduled to launch in 1984. 
While the WF/PC 1 instrument was undergoing final testing in preparation for its 
originally scheduled launch, system-level thermal vacuum tests revealed quantum 
efficiency hysteresis as high as 200% in the CCDs. Mission requirements dictated 
that quantum efficiency must be stable to better than 1%. Further testing revealed 
a previously unknown phenomenon that had unfortunately prevented the discovery 
of QEH earlier in the development. During laboratory testing, the WF/PC 1 CCDs 
had been routinely illuminated with intense UV in order to inspect the coronene 
layer. Only after QEH had manifested in the flight instrument was it discovered 
that exposing the WF/PC 1 CCDs to intense deep UV light (~250nm) stabilized 
the quantum efficiency enough to meet HST specifications. This exposure stabi- 
lized the CCDs for several weeks at room temperature, and for much longer at 
cryogenic temperatures. Recharging with another UV flood would restabilize the 
CCDs; unfortunately, there was no way to access the WF/PC 1 CCDs once they 
had been integrated into the instrument. The solution for WF/PC 1 was to retrofit 
the instrument with an external light pipe that could collect solar UV light entering 
through a hole in the heat shield and direct it to the WF/PC 1 focal plane. With 
this light pipe in place, HST could implement a UV flood on orbit. 

The thinning, oxide formation, and UV flood process development undertaken 
to solve the QEH problem has been described in some detail.?! The dopant profile 
at the substrate/epilayer interface in the TI CCDs is broadened by diffusion during 
CCD fabrication, as dopant atoms diffuse from the substrate into the epilayer. This 
doping gradient results in a sensitive dependence of the surface doping concentration 
after etching, and the quantum efficiency after the UV flood, on the final thickness 
of the CCD. Non-uniform chemical thinning could produce lateral variations in 
the device thickness, resulting in variations in the quantum efficiency and stability. 
Moreover, the effectiveness of the UV flood depended on the process by which the 
native oxide formed, the temperature, and the environmental conditions. 

Clearly a more permanent solution to the QEH problem was needed. As JPL 
began developing WF/PC 2 as a backup instrument to replace WF/PC 1, the 
WF /PC 2 CCDs were redesigned to achieve long-term QE stability without requir- 
ing a UV flood. Four modifications to the WF/PC 1 process were originally pro- 
posed: (1) select sensors with thick and uniform p* layers after thinning; (2) use 
lumogen instead of coronene as the UV phosphor; (3) deposit a thin layer of plat- 
inum on the surface; (4) bias the Pt layer “flash gate”?! to a fixed potential. Too few 
flight-quality CCDs remained to accomplish and validate the redesign, so new CCDs 
were manufactured by Loral based on the TI 800 x 800 format.?4 In 1993, WF/PC 2 
was installed on HST with front-illuminated, lumogen-coated Loral CCDs. 

In 2009, NASA replaced the spectacularly successful WF/PC 2 with its suc- 
cessor, the Wide Field Camera 3. WFC3 includes two back-illuminated 4096 x 
2051 CCDs manufactured by e2v. The e2v back-illumination process uses ion 
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implantation with boron followed by a laser anneal process in order to create a 
thin, highly doped layer of silicon at the back surface. Ion implantation with boron 
introduces fixed, negative charge into the silicon lattice, which can narrow the back 
surface potential well to ~10nm or less, depending on the process parameters. 
Ground tests of the WFC3 CCDs prior to launch revealed quantum efficiency hys- 
teresis; this was mitigated using an illumination process not unlike the UV flood, 
albeit at visible wavelengths.?°: 7° 


3.6. Solving quantum efficiency hysteresis using molecular beam 
epitaxy: 2D-doping for achieving highly stable, high efficiency 
UV detectors 


In the late 1980s, while JPL struggled with WF/PC CCDs, Michael Hecht, Frank 
Grunthaner, and Paula Grunthaner of JPL began studying the surface/interface 
problem in back-illuminated CCDs from the standpoint of materials science. Hecht 
was studying the electronic band structure of semiconductor surfaces using ballis- 
tic electron emission spectroscopy. Hecht’s experiments led to the realization that 
the UV flood process charged the surface by UV-induced adsorption of negatively 
charged oxygen ions on the CCD surface.?’ With this realization, they proposed 
exposing the surface to more highly oxidizing molecules, such as NO and N2O, that 
could charge the CCD surface without requiring a UV flood. This solution proved 
effective but impractical, as these molecules are also corrosive. Paula and Frank 
Grunthaner had recently developed processes enabling molecular beam epitaxial 
growth of silicon at low temperatures.?° Grunthaner, Grunthaner, and Hecht used 
their combined expertise to propose a radically different, permanent solution to 
the QEH problem — the growth by low-temperature molecular beam epitaxy of a 
highly doped, extremely thin silicon layer on the back surface of the CCD. 


3.7. Ion implantation and quantum efficiency hysteresis 


Doping the CCD back surface offers a permanent solution to the QEH problem. 
The ionized dopant atoms are covalently bound in the silicon lattice and are there- 
fore very stable, unlike the UV flood and other methods of charging the surface. 
Although the process is straightforward in principle, eliminating QEH by doping 
the surface is difficult. High and stable quantum efficiency is particularly difficult 
to achieve in the UV, as most of the light is absorbed very close to the surface 
(Fig. 4). Surface doping techniques in CCDs had already been explored; as early 
as 1973, Texas Instruments doped the back surface of their first back-illuminated 
CCDs to 10!8cm~%. However, at this low dopant concentration, the back-surface 
potential well still extends several tens of nanometers into the surface, and most of 
the electrons generated by UV light end up trapped at the back surface. This is why 
the quantum efficiency of Texas Instruments’ first back-illuminated CCD dropped 
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Fig. 4. Logarithmic plot of the absorption length of light in silicon covering the spectral range 
from soft X-rays through the near infrared, illustrating the challenge of UV detection with silicon 
detectors. Because silicon is an indirect gap semiconductor, the near-bandgap absorption length 
is very long. Absorption in the UV is very strong, because at the UV energies higher energy 
bands come into play to allow direct-gap transitions. UV absorption in silicon is dominated by two 
strong absorption peaks at ~360 nm and ~270nm. At 270nm, the 1/e absorption depth in silicon 
is only 4nm. 


from 95% at 500nm to barely more than 10% at 400nm, and it is also the reason 
that the first generation of back-illuminated CCDs designed for HST were coated 
with a UV phosphor (see previous section). 

Moreover, solving the problem of QEH by surface doping is more complicated 
than simply achieving a high surface dopant concentration; other factors must 
be considered. First, the minority carrier lifetime in degenerately doped silicon is 
much shorter than in the high-quality silicon comprising CCDs and CMOS detec- 
tor arrays. Second, lattice defects and interstitial dopant atoms introduce electron 
traps into the highly doped region, further shortening the minority carrier lifetime 
(as well as potentially exacerbating the QEH problem). Third, it is not enough 
to get the dopant atoms into the silicon — in order to be electrically active they 
must be incorporated into silicon lattice sites. Finally, a dopant profile must be 
achieved that is sufficient to create a near-surface electric field strong enough to 
drive photogenerated charge away from the surface before it is lost to recombination 
or trapping. A considerable amount of effort has gone into developing low-energy 
ion implantation techniques that can simultaneously achieve a high surface dopant 
concentration, a very narrow dopant distribution, and a low density of defects. This 
progress has enabled a substantial reduction of QEH, but, as we saw with WFC3, 
even these optimized ion implantation techniques do not completely eliminate QEH. 

It is possible to quantify the relationship between the dopant profile and the near 
surface electric field through numerical modeling of band-bending in doped silicon. 
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Fig. 5. Near surface electric field calculated for four different dopant distributions. The electric 
field is given in units of Volts/nm, and depth from the surface is given in nanometers; for com- 
parison, the minimum 1/e absorption depth in silicon is ~4nm, and thermal energy is ~0.015 eV 
at —100°C. In order to prevent trapping of UV-generated photoelectrons in the backside poten- 
tial well and eliminate quantum efficiency hysteresis, a sharply peaked dopant profile is essen- 
tial to creating a strong electric field within a nm of the surface. Only delta-doping can achieve 
this goal. 


Figure 5 shows the electric field in units of V/nm for four different simulated dopant 
profiles. Only delta-doping achieves a near-surface electric field that is large enough 
to prevent most UV-generated photoelectrons from being trapped in the backside 
potential well. 


3.8. Delta-doping for high efficiency, extended spectral range, 
and stable response 


Low-temperature molecular beam epitaxy (MBE) finally provided the solution 
to back surface passivation and the elimination of quantum efficiency hysteresis. 
Whereas conventional MBE growth of silicon requires heating the silicon substrate 
to temperatures greater than 800°C in order to remove the native oxide, JPL’s 
low-temperature MBE growth process can be accomplished at temperatures below 
450°C. The difference is significant due to the fact that the back-surface modifi- 
cation is performed after the CCD is fabricated. A fully-processed CCD cannot 
survive temperatures above 500°C, due to inter-diffusion of silicon and aluminum 
on the front surface. CCDs do survive JPL’s low-temperature MBE process. 
Because MBE is a crystal growth technique that enables the epitaxial growth 
of silicon one atomic layer at a time, it allows high precision relative to diffusion or 
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implantation. Additionally, MBE is a non-equilibrium crystal growth process which 
allows incorporation of dopant concentrations far higher than can be achieved by 
conventional doping processes. 

Using the layer-by-layer control afforded by the MBE growth process, dopant 
atoms can be incorporated into the crystal in a highly localized layer that lies ~1 nm 
beneath the silicon surface (see Fig. 6(a) and Ref. 29). The resulting dopant distri- 
bution is sharply peaked (nearly as sharply as possible, in fact) and thus resembles 
the mathematical delta function. With a surface charge density of ~ 2 x 10'4cm7?, 
delta-doping creates a very high electric field near the surface that drives photogen- 
erated charge away from the back surface and suppresses the generation of excess 
dark current from the exposed silicon surface. Because this layer is extremely thin, 
essentially all of the photogenerated charge can be detected, even when the inci- 
dent light is absorbed very near the surface. Figure 6{a) shows the delta-doped 
CCD structure, and Fig. 6(b) shows the measured QE response of thinned, delta- 
doped CCDs, compared with a lumogen-coated WF /PC 2 CCD used in HST. Delta- 
doped CCDs exhibit measured quantum efficiency at the theoretical limit of silicon 
transmittance in the EUV, FUV, UV, and the visible. The exceptional stability of 
delta-doped detector arrays has been demonstrated by various groups as a func- 
tion of time,°° temperature,*! illumination history,°? and ambient gases,°° with no 
observed changes in the quantum efficiency under any environmental conditions and 
no evidence of QEH. 


3.9. 2D-doped CCDs, CMOS, diode arrays: High efficiency 
and high stability UV and UV/optical silicon detectors 


Delta-doping and superlattice doping are 2D-doping techniques that provide com- 
plete passivation of back-illuminated devices. These techniques use MBE to grow 
stacks of single-crystal silicon at temperatures low enough for fully fabricated (com- 
plete with metallization) silicon devices. Embedded in the thin silicon layer during 
growth is a delta layer of dopant atoms that creates a high-density of sheet charge, 
producing a field to oppose the naturally occurring deleterious fields in the back- 
illuminated silicon detectors. This single delta layer or multiple delta layers (super- 
lattice doping or SL doping) are deposited on the back surface of the device to 
extend the spectral range to short wavelengths, provide nearly reflection-limited 
response (close to 100% internal QE), and ultrastable response.** In superlattice 
doping, multiple delta layers are grown in succession separated by ultrathin silicon 
interlayers. It has been demonstrated that superlattice doped devices show extraor- 
dinarily high stability and durability in deep and far ultraviolet.?° Because these 
methods involve only back surface modifications, the technique is agnostic to the 
frontside architecture of the silicon detector and has been successfully demonstrated 
and deployed in a variety of architectures and platforms including CCDs, EMCCDs, 
CMOS and avalanche photodiode (APD) designs.** °° Conceptually simple and rel- 
atively easy to implement, superlattice doping represents a fundamentally different 
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Fig. 6. (a) Schematic diagram of cross-section of a delta-doped CCD (top). The epitaxially- 
grown delta-doped layer on the back surface of a thinned CCD places a high density of boron 
atoms ~0.5nm below the silicon surface and the native oxide. (b) Delta-doped CCDs exhibit 
100% internal QE from extreme UV to the visible (i.e., measured external QE is equal to the 
silicon transmittance shown in solid black line, which is the detection limit for uncoated back- 
illuminated silicon devices). In order to extend this comparison between reflection and quantum 
efficiency into the deep UV and extreme UV, the quantum efficiency data have been adjusted 
downward to remove the effects of multiple electron-hole pair production at high photon energies. 
Data from an antireflection-coated, delta-doped CCD demonstrate QE optimization for particular 
spectral range (in this case, the AR-coating was optimized for high QE from 300nm to 400 nm, 
which is a challenging range for silicon due to the two silicon absorption peaks at ~360nm and 
270 nm). See examples of other AR coatings and devices deeper in the UV in later sections. 
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Fig. 7. Development and deployment of 2D-doped arrays: history and trajectory (updated from 
Ref. 34). 


approach to surface doping and passivation than the conventional methods of ion 
implantation and diffusion (detailed in sections above). The structure of a 2D-doping 
superlattice comprises multiple layers of delta-doped silicon grown epitaxially on a 
silicon surface. We have grown 2D-doping superlattices with periods of up to five 
layers on back-illuminated detectors, using an interlayer separation of 1 nm. Crystal 
quality has been verified using reflection high-energy electron diffraction during and 
after growth, and dopant activation has been verified using Hall measurements on 
MBE-grown wafers. Applied to back-illuminated detectors, the superlattice forms 
an electrically conductive contact layer and a potential barrier for electrons within 
the first few nanometers of the surface. Most importantly, 2D doping creates high 
efficiency, tailorable response, and ultrahigh stability and durability, as demon- 
strated on CMOS and CCD arrays. Figure 7 summarizes some of the highlights of 
development and deployment of 2D-doped silicon arrays. 


3.10. Achieving high QE and high stability using 2D doping 


As discussed above, the 2D processing is agnostic to the pixel design and architecture 
of silicon detector readout structure and frontside circuitry. JPL has developed 
end-to-end post-fabrication processing that starts with fully fabricated wafers from 
virtually any foundry and produces back-illuminated detectors. These processes 
have been developed and applied to a variety of CMOS, CCD, EMCCDs, APDs, 
and PIN arrays.** Figure 8 shows the general process flow, while Fig. 9 shows the 
QE achieved in a variety of devices, tailored for various applications. 
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Fig. 8. JPL’s end-to-end post-fabrication process flow (including 2D doping and ALD custom 
coatings) to produce high-performance UV/optical/NIR sensitized devices with 100% internal QE 
and tailorable external QE (after Ref. 34). 


4. Out-of-band Rejection and Reduced Surface Reflection 
in Ultraviolet Detectors 


Achieving 100% internal QE through the use of 2D-doping techniques solves the 
issue of conserving photons that are absorbed into the silicon detector. This still 
leaves the problem of minimizing photons lost due to the natural reflectance of the 
silicon surface. With a back-illuminated sensor format, it is possible to implement 
a variety of optical thin film structures directly onto the detector to accomplish 
broadband anti-reflectivity (AR) or narrowband filtering. These structures can be 
deposited with standard thin-film vacuum coating methods, although compatibility 
with the 2D-doped surface must be considered. For example, energetic deposition 
methods like reactive sputtering or plasma-enhanced chemical vapor deposition can 
potentially de-passivate the Si surface and degrade QE.** Other techniques like 
evaporation and atomic layer deposition (ALD) have proven compatibility with the 
2D-doped surface. 

In the visible and NIR portion of the spectrum, Si possesses a large real part 
of the refractive index (ngi ~ 3.5-5). Because the imaginary part of the refractive 
index of Si is small at visible wavelengths, optimal single layer AR coatings will 
have a refractive index approximately equal to the square root of ng;j. There are 
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Fig. 9. Several QE curves for various applications of delta-doped arrays. (a) Measured QE in 
bands designed for a space spectroscopic mission (after Ref. 40). (b) Measured QE of a delta-doped 
EMCCD flown on the FIREBall-2 balloon experiment (Ref. 37). (c) Measured QE of delta-doped 
and broadband 4k x 2k, 10 um pixel SRI CMOS array (collaborative work with SRI; SRI packaged 
and tested the arrays after delta doping and coatings). (d) Measured QE of delta-doped and AR 
coated (broadband optimization) of a delta-doped fully depleted STA CCD at Palomar’s Wafer 
Scale Prototype (WasP). 


several suitable materials with a refractive index in this range, allowing for simple 
AR coating designs that can achieve very high external QE. Materials like Al2Os 
(n ~ 1.6) and HfO2 (n ~ 2.2) can yield QE of 80-100% at any given design wave- 
length in the visible. If a wider spectral range of improved QE is desired, two-layer 
AR coatings can be employed.*? For spectroscopy applications, such coatings can 
be implemented with laterally graded thickness in order to tune the spatial response 
of the detector to the spectral dispersion of the incoming light. 

In the UV, obtaining maximum throughput becomes more challenging, partic- 
ularly at wavelengths between 120nm and 300 nm where the natural reflectance 
of silicon and the absorption of the native oxide can yield external losses totaling 
more than 70%. Coating designs that can achieve high performance throughout 
this entire range are also challenging because the refractive index of Si is changing 
significantly throughout the UV portion of the spectrum, as shown in Fig. 10. This 
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Fig. 10. The modeled FUV/NUV reflectance of silicon with and without a native oxide layer that 
produces some additional absorption loss in the FUV. The predicted absorption depth into bare 
silicon reaches a minimum in the UV (after Ref. 45). 


means that a single coating design is less likely to yield high throughput in spectral 
regions away from the design wavelength. 

Nevertheless, simple single-layer AR coatings can achieve >50% external QE 
across the FUV and NUV.* Stable oxide materials like AlgO3 have sufficient trans- 
parency to operate at wavelengths as short as 180-190nm with improved trans- 
mittance over bare silicon; SiOz can provide modest improvement in transmission 
down to 150-160nm. For applications at shorter wavelengths the only optical coat- 
ing materials with sufficient transparency are wider-bandgap materials such as the 
metal fluorides MgF2, AIF3, and LaF3. Evaporation is a common approach for such 
materials and recent work has also demonstrated that ALD is also viable process 
option. = 

In addition to minimizing reflection losses, it is critical for many applications to 
minimize detector response to out-of-band wavelengths. Detectors for space-based 
FUV astronomical applications often operate in a strong background of visible and 
near-UV radiation that can reduce overall SNR for observations of interest. This 
presents a challenge for broadband Si detectors, which maintain responsivity out to 
wavelengths equivalent to the Si bandgap near 1100 nm. In low-earth orbit back- 
ground zodiacal light, earth-shine, and geocoronal emission can contaminate the sig- 
nal and even dominate over UV sources, which are often much fainter. Zodiacal light 
is present as a diffuse background of solar radiation reflected by residual atmosphere 
and dust particles, therefore mirroring the solar spectrum with a broad spectral 
distribution peaking near 500 nm. In the FUV, this component is diminishing; how- 
ever, strong emission lines from hydrogen and oxygen can be become important, 
particularly the H Lyman-alpha line near 120 nm, neutral O lines near 130nm, and 
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an ionized O line at 247 nm.** The strength and relative contribution of all of these 
background phenomena are dependent on orbital altitude, day/night cycles, and 
pointing direction, and in particular these implementation methods can be used to 
reduce UV contamination from unwanted in-band UV sources. 

Broad long-wavelength rejection, or red rejection, can be achieved convention- 
ally through the use of UV bandpass filters. Due to the materials challenges noted 
above, these filters in the FUV are generally restricted to aluminum-based metal- 
dielectric filters, as opposed to multilayer all-dielectric filters commonly used at 
visible wavelengths. These metal-dielectric filters provide a peak transmission of 
~20-50% in order to yield out-of-band rejection with an optical density of OD3-6. 
The possibility of integrating this type of filter directly onto a Si detector is a path 
toward the demonstration of a solid-state sensor that can operate with visible or 
solar blindness. Fortuitously there is also a significant optical throughput advantage 
for the direct integration of such structures. The high UV reflectance of Si becomes 
beneficial to the optical admittance of the metal-dielectric filter structure, acting 
essentially as an additional reflector in the stack. 

For example, a single aluminum layer does not make an effective optical band- 
pass filter when implemented on a transparent substrate; the transmission is simply 
approximated by the penetration depth at a given photon energy. On Si, a single 
aluminum layer spaced by an appropriate transparent dielectric can yield strong 
bandpass behavior and even exceed the maximum transmission obtained with single 
layer dielectric AR coatings, as shown in Fig. 11. Broad long-wavelength rejection 


100 
Si/ AIF,/ Al/ AIF, 


Si/ Al,O3/ Al/ Al,O; 


Pa Si HfO,/ Al / HfO, 


60 bare Si \ 


transmission (%) 


100 150 200 250 300 350 400 450 500 
wavelength (nm) 


Fig. 11. The modeled transmission (or external QE) of metal-dielectric filter structure integrated 
directly onto silicon can exceed the silicon reflection limit while providing a narrowband UV 
response with broad, long-wavelength rejection (after Ref. 46). 
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Fig. 12. (a) The measured effective QE of a 2D-doped Si APD with a five-layer Al/Al,O3 
integrated bandpass filter. The APD QE was measured under unity gain conditions, and the 
plotted values approximate the effective QE of the device operating under full bias conditions. 
(b) Photographs of Si CCDs with integrated long wavelength bandpass filters developed for space 
astronomy applications including SPARCS (c). 


is achieved outside the resonant wavelength, and higher-order effects at shorter 
wavelengths can be suppressed with appropriate choice of spacer material. 

These integrated rejection coatings have been combined with 2D-doped sensors 
for particle physics applications at a target wavelength of 200nm, utilizing Al2O3 
spacer layers. These photodiode devices achieved an effective QE of greater than 
50% with OD3-4 long wavelength rejection (see Fig. 12).9° At shorter wavelengths, 
fluoride spacers are required. Such a coating is being implemented on the FUV 
camera of the astrophysics CubeSat SPARCS, targeting >40% peak QE with OD5— 
6 levels of rejection.4” Generally, peak transmission can be traded for increased 
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out-of-band rejection by increasing metal layer thickness or increasing the num- 
ber of metal layers (cavities) in the stack. Another benefit of direct integration of 
these solar-blind filters is that in-band reflectance is greatly reduced relative to all- 
dielectric AR coatings. This can have a beneficial impact on the overall performance 
of an optical system with respect to ghosting and stray light. 

It is also possible to incorporate additional functionality directly onto a sensor 
system to enhance performance in astronomy applications by performing functions 
like polarization selectivity and low-cross-talk color filtering, among other examples. 
At visible wavelengths such approaches have been demonstrated with pixel-scale 
wire-grid polarizers implemented on a CMOS imager.*® 

Concepts in plasmonics and metamaterials have also extended to filters 
directly integrated onto sensor systems for similar applications.4? Extending these 
approaches to UV wavelengths again presents materials and processing challenges, 
but may offer some performance advantages over conventional coatings when options 
may already be limited.°° For space-based instrumentation there be can benefits in 
reduced system complexity, mass, and volume of such an approach. 


5. III-Nitrides: Photocathodes and Solid-State Detectors 


5.1. Photocathodes 


Detectors based on image tubes, such as MCPs, utilize a semiconductor PC to 
convert photons into electrons that are ejected into vacuum (Fig. 13) and are 
subsequently detected and amplified by the MCP (Fig. 14). Because of their low 
dark current and high gain, PC-based detectors can be used for photon-counting 
applications. 

As shown in Fig. 13, a PC consists of a p-type semiconductor in which the bands 
bend downward at the surface due to the presence of positive charge resulting from 


surface 


Vacuum 
level 


Fig. 13. Band diagram for a semiconductor photocathode (opaque mode). The horizontal axis 
represents depth. Incoming photons create electron-hole pairs if photon energies are larger than 
the semiconductor bandgap. Large positive charge near or at the surface causes the bands to bend 
strongly downward, moving the vacuum level lower and allowing electrons to escape into vacuum. 
Photon energy is represented by hv, and x is electron affinity. 
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Fig. 14. Simplified MCP detector system. Light excites photoelectrons from a PC across vacuum 
to a MCP, which multiplies the electrons. Readout may be from one of several types of position- 
sensitive detectors. 


surface states. Absorption of photons with energies greater than the bandgap creates 
electron-hole pairs. Electrons thus excited can propagate to the PC surface and exit 
into vacuum if the electron energy is greater than the vacuum level. The position of 
this vacuum level depends both on the semiconductor electron affinity and on the 
amount of near-surface band-bending. 

The ideal PC material should produce a high yield of photoelectrons when under 
illumination (i.e., short photon absorption length, long electron diffusion length, and 
small or negative electron affinity) and be both chemically and physically robust. 
PCs may use either opaque geometry, in which light absorption and electron emis- 
sion occur at the same surface, or semitransparent geometry, in which light enters 
one surface and electrons exit the opposite surface. In the latter case the PC must be 
thin enough to avoid large scattering and recombination losses during the electron 
transport process. 

PC electron affinity (EA) plays a significant role in producing high QE, and for 
a semiconductor is the difference in energy between the conduction-band minimum 
and that of an electron at rest in the vacuum; i.e., it is the energy necessary for an 
electron at the semiconductor conduction-band minimum to escape into free space. 
In order to make the PC QE as large as possible, three properties are desirable: 
(1) PCs should have low or negative electron affinity (NEA), (2) downward band 
bending at the surface should be as large as possible, and (3) the near-surface 
conduction-band potential well (Fig. 13), created by the downward band bending, 
should be as narrow as possible. “Effective NEA” occurs when the vacuum level 
at the surface is at lower energy than the bulk conduction band edge; it is thus a 
combined effect of actual EA and band bending. If low effective EA is not achieved, 
many photoexcited electrons can fail to escape the surface if they lose energy to 
inelastic scattering. Scattering losses are also greater if the surface well is wider, 
which allows more electrons to lose energy by scattering in the well. 
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Fig. 15. Cesiated PC surface. The double layer of Cs and CsOz creates a strong dipole that 
produces effective negative electron affinity, with the vacuum level at lower energy than the bulk 
conduction band level in the semiconductor. 


What physical properties will optimize these three parameters to produce high 
QE? First, the native EA of a material depends on the microscopic charge con- 
figuration at the surface. Some materials naturally have lower EA values, among 
them the materials normally used for photocathodes. Deposition of extrinsic surface 
layers, such as cesium, produce a large dipole at the surface and can lower the EA 
substantially (Fig. 15). However, these layers are highly reactive in air, as discussed 
below, and thus require vacuum fabrication and assembly methods. 

The second parameter, large downward band bending, depends sensitively on 
near-surface charge. This charge can arise naturally from dangling bonds, surface 
oxide, or contamination. Intentional downward band bending can be introduced by 
techniques such as high n-type doping near the surface. For polar semiconductors 
such as GaN, surface charge can also result from polarization effects. In particular, 
the N-polar GaN(0001) surface (the polarity in which nitrogen atoms are nearest 
the surface) has positive polarization charge on its surface, which enhances down- 
ward band bending. Thus, using surface n-doping and polarization engineering, the 
conditions for high QE can be encouraged. 

The final parameter, width of the surface potential well, can be affected by 
doping; higher p-doping in the photocathode material will produce a narrower well. 
For the case of polar semiconductors, heterostructure designs that take advantage 
of polarization charge can also reduce the potential well width. 

Photocathode materials such as CszTe, CsI, and KBr have been used in instru- 
ments, depending on the cutoff wavelength desired. These materials are all reactive 
to varying degrees in air. CsgTe has generally low QE, with a peak at about 50% 
at wavelengths between 165nm and 190nm in the opaque configuration (but less 
at shorter wavelengths).°! It is solar blind with a cutoff wavelength around 300nm 
depending on particular fabrication parameters. In addition to its generally low 
QE, the major drawback of this material for use as a mid-UV photocathode is 
its reactivity. Even short-term exposure to atmosphere, <5s, results in complete 
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degradation of the performance. CsI and KBr are hygroscopic and degrade under 
exposure to water vapor in air. In the far UV, CsI PCs have been used because they 
have a shorter-wavelength cutoff (wider bandgap), and achieve a QE of ~40% at 
130nm.°? KBr, another wide-bandgap material, has also been used.?? 

Because of their reactivity in air, many UV PCs are sealed in vacuum tube 
enclosures along with their electron multiplication components (MCP, dynodes, or 
EBCCD). Even so, as mentioned above, quantum efficiency is only in the 20-50% 
range. A UV detector with high QE across this entire range of wavelengths, which 
could withstand air exposure, would produce tremendous advances in both QE and 
stability of UV instruments. 

GaN and its ternary AlGaN alloys have received increased attention for their 
applications as photocathodes. Because the EA of GaN is close in value to its 
bandgap, effective NEA is difficult to achieve without cesiation. High QE (>70%) 
in the 100-300-nm range is achieved by depositing Cs or a combination of Cs and 
CsO, on the surface of p-type GaN.*4 This performance surpasses that of other 
photocathode materials at mid-UV wavelengths. 

However, the use of Cs or Cs oxide still compromises the intrinsic surface sta- 
bility of GaN. Furthermore, due to additional electron scattering in the disordered 
Cs layer, cesiated designs may not produce the highest QE possible with GaN PCs. 
For these reasons, work has been done to develop non-cesiated designs for II-nitride 
PCs.°° 

Figure 16 shows recent results for a non-cesiated N-polar GaN PC. High QE 
is observed, although the light source was ex vacuo which prevented obtaining 
data below 200nm. Near 200nm, the QE is competitive with cesiated GaN PCs. 
These promising results indicate that, by optimizing doping levels and incorporating 
AlGaN into the design, even higher QE values are achievable. 

Figure 16 also demonstrates the large intrinsic out-of-band rejection provided 
by wide-bandgap semiconductors. At least five orders of magnitude of rejection is 
possible using these materials, without the use of external filters. Such filters can 
also be employed if even greater rejection is required. 

It is also important to note that GaN, together with AlGaN, enable tunability 
of response, including the long-wavelength cutoff. The EA of AlGaN is smaller than 
that of GaN, with higher Al fractions producing smaller EA values, and the bandgap 
of AlGaN increases with Al fraction. Thus, AlGaN with Al fractions around 30% 
can achieve NEA. This enables flexibility in PC design; the lower EA provides an 
easier path to NEA, while the bandgap tunability provides the capability to tune 
the photocathode response with wavelength. Solar blindness (cutoff at 280 nm) can 
also be obtained by using AlGaN with ~30% Al. 


5.2. Avalanche photodiodes 


Photocathode-based devices perform very well for high-QE, low-noise UV measure- 
ments, but in addition to their stability challenges they are bulky and require high 
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Fig. 16. Visible/UV emission for a non-cesiated N-polar p-GaN PC. Data are plotted as external 
quantum efficiency (electrons out per photon incident). The data below 200 nm was not obtained 
due to UV light absorption in air below 200 nm. 


voltage. Solid-state detectors using GaN and AlGaWN are also being developed as 
a compact, lower-voltage alternative. One promising technology is the avalanche 
photodiode, which can be implemented using III-nitrides.°°°* APDs have high 
internal electron gains produced in a high-field region within the device. Photons 
are absorbed in an absorption region, producing electron-hole pairs. These are sepa- 
rated by the electric field produced by an applied voltage. If the field is high enough, 
the electrons and holes can undergo avalanche multiplication, with resulting gains 
in excess of 10°. Because of this large gain, APDs can be used for photon-counting 
applications. 

APDs fabricated from IIJ-N materials are insensitive to longer wavelength light 
just as III-N PCs are, due to their wide bandgaps. If the absorption region is com- 
posed of AlGaN with an Al fraction >30%, the detectors can be made solar-blind. 
Although absorption and multiplication can occur within the same intrinsic region 
of the device, most GaN APDs consist of p-i-n-i-n structures that are designed to 
have separate regions for light absorption and avalanche multiplication, allowing 
separate optimization of each region. 

Although II-nitrides are promising for APD photon-counting UV detectors, 
due to their wide bandgaps, chemical robustness, and radiation tolerance, challenges 
remain that limit their performance. These include the low activation efficiency of 


30 S. Nikzad et al. 


p-dopants, high dislocation densities, and sidewall leakage. These lead to increased 
dark current, microplasmas, and premature breakdown.°? Current work to mitigate 
these challenges includes improved material growth methods, sidewall passivation 
techniques, and device designs that engineer the electric field at device boundaries. 


6. Outlook of Solid-State UV Photon Counting Detectors 
Technologies 


As technological capabilities continue to advance, new devices emerge to address the 
expanding needs of UV science. Next-generation UV missions require more sensitive 
detectors with higher temporal and spatial resolution, lower noise, and greater long- 
term stability. New technologies will mature and will infuse into future missions to 
enable new science. 

Several new technologies have either already proven their capabilities for UV 
detection, or have shown promise for UV applications in the near future. The Quanta 
Image Sensor (QIS), briefly mentioned earlier, takes advantage of advancing device 
lithography capabilities such as smaller feature size and 3D stacking. These sensors 
divide image pixels into separate sub-pixel detectors (“jots”), each of which can 
photon-count.!® Each jot has small enough sense capacitance that a single electron 
produces a large voltage to overcome other sources of noise. The individual jot 
values are then digitally integrated to produce sub-electron read noise per pixel. 
The ability of a solid-state imager to photon-count without gain, such as from 
avalanche multiplication, is significant. Avalanche processes have their own noise 
sources from statistical fluctuations of the avalanche process and from dark counts 
due to the high applied voltages. 

The new field of metamaterials and metasurfaces also holds great promise for 
enhancing UV detection capabilities. Metamaterials are artificial structures with 
deeply sub-wavelength periodic features. The sub-wavelength nature of the features 
enables fabrication of structures with effective optical properties not available in 
natural materials.©° For instance, the effective permittivity and permeability can 
be controlled by varying the size, shape, and distribution of features in the material, 
without changing the material composition itself. Generally, metamaterials rely on 
the resonance properties of the small features under illumination. These resonances 
can be, for instance, plasmonic in metallic materials®! or Mie resonances in dielectric 
materials.°? Combining metals and dielectrics provides a rich palette for designing 
a desired optical response. 

Although fabrication of a 3D structure with periodic features (such as inclu- 
sions or voids) distributed throughout the volume can be challenging, many of the 
advantages for optical manipulation can be achieved by the 2D analog of meta- 
materials known as “metasurfaces,” in which periodic sub-wavelength features are 
fabricated only on the surface of the material.®:®4 A wide range of applications have 
been demonstrated for metasurfaces, such as “perfect” flat lenses that surpass the 
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diffraction limit, ultrathin filter layers,°° directional wavelength-selective reflec- 
tors,” and polarization-sensitive structures.®* 

Most of the work in metamaterials and metasurfaces has been done at longer 
wavelengths, where the fabrication requirements are less severe. At UV wavelengths, 
fabrication of deep sub-wavelength structures is more challenging. The effective 
response of these structures depends on the uniformity of features in both size 
“wash out” the desired opti- 
cal response. UV applications also face the challenge of fewer suitable constituent 
materials.©? Metals must have higher plasma frequencies to operate effectively, and 
parasitic energy-loss mechanisms come into play for larger exciting photon ener- 
gies. However, as nanofabrication methods continue to evolve, the ability to apply 
metamaterial concepts to UV detection is expected to increase substantially. 


and shape, as a spread in these parameters tends to 
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Microwave kinetic inductance detectors (MKIDs) are superconducting detectors 
that can measure the energy and arrival times of individual photons with essen- 
tially no false counts. They excel at low flux, narrow field of view, low resolution 
integral field spectroscopy where every photon is vital, a common occurrence 
in modern astronomy. In this chapter, we will cover the operating principles of 
MKIDs and explore their applications. 


1. Operating Principles 


Low temperature detectors (LTDs) operating at temperatures on the order of 
100 mK are currently the preferred technology for astronomical observations over 
large parts of the electromagnetic spectrum, especially in the far-infrared through 
millimeter (0.1-3mm),!° X-ray,4 and gamma-ray” wavelength bands. In the impor- 
tant UV, optical, and near-IR (UVOIR, 0.1-5 wm) wavelength range detector tech- 
nologies based on semiconductors, backed by large investment from both industrial 
and military customers, has resulted in excellent detectors for astronomy. These 
detectors have large formats, high quantum efficiency, and low readout noise.® They 
are, however, limited by the large band gap of the semiconductor which restricts the 
maximum detectable wavelength (1.1eV for silicon), and by thermal noise sources 
from their relatively high (~100 K) operating temperatures. LTDs allow the use of 
superconductors with gap parameters over 1000 times lower than semiconductors. 
This difference allows a jump in capabilities. A superconducting detector can count 
single photons with no false counts while determining the energy (to several percent 
or better) and arrival time (to a microsecond) of the photon. LTDs also tend to 
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have much broader wavelength coverage since the photon energy is always much 
greater than the gap energy. While a CCD is limited to about 0.35-1 wm, LTDs for 
the UVOIR are in principle sensitive from less than 0.1m in the UV to greater 
than 5um. The 0.1 um short wavelength cutoff is mainly dictated by the use of 
a transmissive microlens array, and design changes can enable sensitivity through 
the UV and X-ray.’ The long wavelength limit is determined by a combination 
of sensitivity to low energy single photons and the high count rates from thermal 
blackbody radiation. While LTDs are sensitive to longer wavelengths (up to the gap 
energy of the superconducting absorber, ~60 GHz for T. = 800mK), they usually 
operate as bolometric detectors in this wavelength range.*? 

Superconducting UVOIR detectors have been pursued in the past primarily 


10,11 and transition 


with two technologies, superconducting tunnel junctions (STJs) 
edge sensors (TESs).!*13 While both of these technologies produced functional 
detectors, they are limited to single pixels or small arrays due to the lack of a 
credible strategy for wiring and multiplexing large numbers of detectors, although 


14,15 especially for long 


there is significant ongoing work on larger TES multiplexers, 
wavelength applications. 

Microwave kinetic inductance detectors (MKIDs), 
genic detector technology that can be easily multiplexed into large arrays. The 
“microwave” in MKIDs comes from their use of frequency domain multiplexing!® 
at microwave frequencies (0.1-20 GHz), which allows thousands of pixels to be read 
out over a single microwave cable. The lumped element? Optical and near-IR, (OIR) 
MKID arrays that we have developed have significant advantages over semiconduc- 
tor detectors. They can count individual photons with no false counts and determine 
the energy and arrival time of every photon with good quantum efficiency. Their 
physical pixel size and maximum count rate is well matched with large telescopes. 
These capabilities enable powerful new astrophysical instruments usable from the 
ground and space. 


16-18 are a newer cryo- 


1.1. The surface impedance of superconductors 


MKIDs work on the principle that incident photons change the surface impedance 
of a superconductor through the kinetic inductance effect.24 The kinetic inductance 
effect occurs because energy can be stored in the supercurrent (the flow of Cooper 
Pairs) of a superconductor. Reversing the direction of the supercurrent requires 
extracting the kinetic energy stored in it, which yields an extra inductance term 
in addition to the familiar geometric inductance. More information on the kinetic 
inductance effect can be found in Refs. 18 and 22. The magnitude of the change 
in surface impedance depends on the number of Cooper Pairs broken by incident 
photons, and hence is proportional to the amount of energy deposited in the super- 
conductor. This change can be accurately measured by placing a superconducting 
inductor in a lithographed resonator, as shown in Fig. 1. 
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Fig. 1. Left: The basic operation of an MKID, from Ref. 16. (a) Photons with energy hy are 
absorbed in a superconducting film, producing a number of excitations, called quasiparticles. (b) To 
sensitively measure these quasiparticles, the film is placed in a high frequency planar resonant 
circuit. The amplitude and phase of a microwave excitation signal sent through the resonator 
are shown in panels (c) and (d), respectively. The change in the surface impedance of the film 
following a photon absorption event pushes the resonance to lower frequency and changes its 
amplitude. If the detector (resonator) is excited with a constant on-resonance microwave signal, 
the energy of the absorbed photon can be determined by measuring the degree of phase and 
amplitude shift. Right: An example of frequency domain multiplexing (FDM) of MKIDs. Each 
MKID is a superconducting resonator tuned to a different resonant frequency by changing the 
resonator design. 


1.2. Theoretical limits on energy resolution 


The primary theoretical limitation on the spectral resolution is from the intrinsic 
quasiparticle creation statistics during the photon absorption event. The energy 
from the photon can end up in two places, the quasiparticle system and the phonon 
system. These systems interact, allowing energy exchange between the two, which 
reduces the statistical fluctuation from Poisson by the Fano factor F’, typically 
assumed to be 0.2.7? The spectral resolution of the detector, R = \/FWHM()\) = 


E/FWHM(E), can be written as R = s35 he where 77 is the efficiency of conver- 


sion of energy into quasiparticles (typically assumed to be 0.57),?74 hy is the photon 
energy, F’ is the Fano factor, and A is the energy gap. The energy gap depends on 
the superconducting transition temperature (T.) of the inductor, A ~ 1.72kpT., 
and we typically operate at a base temperature of T,./8 to reduce the number of 
thermally generated quasiparticles. Going to lower T,, and hence lower operating 
temperature, improves the theoretical R. Operating at 100mK yields a theoretical 
spectral resolution of R + 100 at 400 nm. Previous research with Superconducting 
Tunnel Junctions (STJs) with superconducting absorbers has shown that supercon- 
ducting absorbers can approach the Fano limit.2>-?7 
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1.3. Multiplexing 


A MKID, with the equivalent circuit shown in Fig. 1, is essentially a microwave 
resonator with a resonant frequency fo in the GHz range. It has nearly perfect 
transmission away from the resonant frequency fo, but acts as a short to ground on 
resonance, reflecting most of the power back at the source. To read out a MKID a 
microwave probe signal, typically a simple sine wave, is tuned to fp the resonator. 
Photons imprint their signature as changes in phase and amplitude of this probe 
signal. Since the quality factor Q of the resonators is high and their microwave 
transmission off resonance is nearly perfect, multiplexing can be accomplished by 
tuning each pixel to a different resonant frequency with lithography during device 
fabrication. A comb of probe signals can be sent into the device, and room tempera- 
ture electronics can recover the changes in amplitude and phase without significant 
cross-talk.?8 


1.4. Noise 


While the fundamental spectral resolution of a MKID is set by the Fano Limit 
(Section 1.2), in practice other sources of noise tend to limit performance. The 
two sources of noise that dominate the phase readout of optical photons are noise 
from the microwave HEMT amplifier, and noise from two-level systems (TLSs)?9 3° 
near the superconductor-substrate interface. In modern UVOIR MKIDs"" these two 
noise sources are of similar magnitude, with HEMT noise usually slightly dominating 
in the 3-60kHz band which contains most of the photon signal. 

Both sources of noise described below depend on the microwave power being 
used to read out the resonator. The physics behind the maximum readout power, 
and the possibility of reading out MKIDs in the nonlinear regime, has been 
explored.?!_ Most UVOIR MKID work has focused on the linear regime, where 
the MKIDs are biased at the maximum microwave power before bifurcation of the 
resonance in the IQ plane. 


1.4.1. HEMT noise 


The noise temperature of modern HEMT amplifiers in the 4-8 GHz band is ~2 K. 
This noise temperature, along with the microwave readout power of the resonator 
and the common assumption that Q; >> Q., predicts the added white phase noise 
from the HEMT amplifier. This noise floor sets the maximum readout noise expected 
from the readout chain, which is usually set by the performance of the analog-to- 
digital converters (ADCs). Figure 2 shows a plot the required ADC performance as 
a function of microwave readout power and number of readout tones. HEMT noise 
typically exhibits a 1/f component below 1 Hz, but this low frequency drift is not 
an important factor for photon counting MKIDs. 
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Fig. 2. The required ADC phase noise floor in a one second integration (also known as noise 
spectral density) for a given readout power per resonator and number of channels. The latest 
UVOIR MKID electronics, described in Section 2.1, use a 2-GSPS Analog Device AD9625 with 
a noise spectral density of roughly —145dBc/Hz. This means that at a typical resonator input 
power of —90dBm this ADC can read out roughly 5000 resonators, assuming random excitation 
tone phases. For more information, see Ref. 28. 


1.4.2. Two-level system noise 


TLS noise occurs because there are states, primarily located at the superconductor- 
substrate interface, that can tunnel between two stable states by absorbing a 
microwave photon from the resonator. TLS noise typically presents as a “pink 
noise”, with a spectral index of —0.5 with a single pole rolloff at the resonator 
bandwidth fo2Qm, where fo is the resonant frequency and Q,, is the measured Q 
(sometimes written as Q,.). Larger capacitors are shown to reduce TLS noise signifi- 
cantly by moving more of the electromagnetic field away from the surface region, but 
many devices are space constrained by the pixel pitch required for optical telescopes. 
Improvements in the quality of the superconductor—substrate interface could lead 
to dramatic improvements in TLS noise. Like HEMT noise, TLS noise goes down 
with the square root of increasing microwave readout power, so resonators with 
higher readout powers are highly desirable. TLS noise has been shown to only effect 
the phase of a resonator,?° so amplitude based readouts bypass this noise source. 
However, in most MKIDs the amplitude signal is at least a factor of 4 smaller than 
the phase signal, so a phase readout is usually used. This situation may change if 
quantum limited amplifiers become available.” 
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1.5. Array design 


MKIDs are very versatile, as essentially any resonator with a superconducting induc- 
tor will function as a MKID. In 2008, we decided to pursue a lumped element res- 
onator design.?° The resonator itself consists of a 60-nm thick*? sub-stoichiometric 
titanium nitride (TiN,) film*+ with the nitrogen content tuned with 2 < 1 such 
that the superconducting transition temperature T, is about 800mK. Due to the 
long penetration depth of these films (~1000 nm) the surface inductance is a high 
25 pH/square, allowing a very compact resonator fitting in a 222 x 222 ym square, 
as shown in Fig. 3. A square microlens array with a 92% optical fill factor is used to 
focus light onto the photosensitive inductor. The pixel pitch is easily controlled, with 
pitches between 75-500 um relatively easy to achieve. The quasiparticle lifetime in 
our TiN films is 50-100 us. This sets the pulse decay time, allowing a maximum 
count rate of ~2500cts/pixel/second before problems arise in separating pulses. 
TiN has excellent microwave quality factors with Q; > 10°, but the requirement 
to push the T, down to 800 mK to get good sensitivity put us on a very steep slope of 
T, versus nitrogen content. This resulted in a film with dramatic variations in the T,,, 
which caused large changes in the surface inductance across the wafer. This pushed 
resonator frequencies around uncontrollably. When two resonators have resonant 
frequencies that are too close together it cause frequency collisions, and only one 


Gold Bond Pad 


Fig. 3. Left: A microscope image of a portion of the 2024 pixel MKID array used at the Palomar 
200-inch telescope. A microlens focuses the light on to the 40 x 40m inductor, resulting in an 
optical fill factor >90%. Various features of interest are labeled. Right: A zoomed in view of the 
array in the left panel. 
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of the resonators can be read out. This leads to dead pixels. TiN also required a 
Si substrate (many attempts to deposit on sapphire all had very poor Q;), and 
at optical wavelength stray photons would be absorbed in the silicon, creating free 
electrons that could interact with pixels to create temporary phase fluctuations that 
resulted in undesirable hot pixel behavior.?° 

Potentially even worse, STM measurement of TiN®® show that the gap is non- 
uniform down to 20 nm scales even in stoichiometric TiN films, with ~15% variations 
on 50nm scales. This effect is likely significantly enhanced in our sub-stoichiometric 
films. This may not be a problem in a sub-mm device where quasiparticle are broken 
uniformly across a large inductor area, but ina UVOIR detector the photon interacts 
at single random location inside the inductor, and the low diffusion length (believed 
to be just several microns or less) means that diffusion cannot smooth out this 
absorption location dependent response. This means that if a photon is absorbed 
in an area with a lower than average gap the signal would be larger than expected. 
Indeed, we believe we have seen this geometric effect in TiN. Our best TiN detectors 
showed a spectral resolution of R ~ 10 at 405nm, but the pulse shape and measured 
noise spectrum indicated we should be getting R ~ 20. We believe this discrepancy 
results from this effect. 

These issues caused us to look for another superconducting material that would 
be stoichiometric, very uniform, and would work on an insulating substrate like 
sapphire. During the last several years we have developed a new material, platinum 
silicide (PtSi), that has led to the best UVOIR MKID arrays ever produced with 
20,000 pixels, >90% of the pixels functional, R = R/AE ~ 8 at 1 wm, and a quantum 
efficiency of ~35%. In the last several months, we have had success developing anti- 
reflection coatings that appear to boost the quantum efficiency (QE) of our PtSi 
films above 70%, and future optimization should increase this number further. These 
state-of-the-art MKID arrays are discussed in detail in a recent paper in Optics 


Express.>" 


1.5.1. Feedlines, heat sinking, and frequency coding 


Three major issues must be solved before scaling single pixel MKIDs up to large 
arrays. First, a ground plane with excellent electrical continuity is required to ensure 
a consistent coupling quality factor Q, and reduce resonator crosstalk. This requires 
frequent feedline crossovers. While some groups have pursued air bridges®® for larger 
sub-mm MKIDs, the smaller UVOIR MKIDs typically connect the Nb ground planes 
of the coplanar waveguide (CPW) feedline with a 2-um Nb bar that is insulated 
from the Nb center conductor by a thin SiOz layer. These feedline crossovers are 
used at least every several hundred microns. 

Heat sinking of a MKID array is an important issue due to the poor thermal 
conductivity of most materials at 100 mK. The first large MKID arrays used sparse 
aluminum wirebonds and polymer adhesives. This works fine without external illu- 
mination, but even low levels of light hitting the detector caused a significant and 
persistent temperature rise with time constants of at least 30s. Replacing the sparse 
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aluminum wire bonds with a dense array of gold bond wires attached to gold wire 
bond pads on the chip eliminated the heating problems. 

Another major challenge of designing large MKID arrays is the propensity of 
resonators in close proximity, both physically and in frequency space, to interact.°° 
This interaction is easily predictable with an EM simulation package like SONNET. 
The interaction strength can be controlled through careful resonator design — this 
is shown in Fig. 3, where the double wire meander of the inductor was designed 
to drastically reduce the interaction strength over a more conventional single wire 
meander. Even with this design, nearest neighbor resonators are spaced at least by 
50 MHz apart. It is best not to put resonators that are close in frequency physically 
too far apart, however, as large scale gradients in the superconducting film can cause 
resonator collisions. As a compromise, we use an algorithm that starts with a smooth 
pattern of resonance frequencies and then exchanges resonator positions randomly 
until the nearest neighbors are separated by an acceptable frequency distance. One 
consequence of this frequency coding is that resonant frequency and pixel location 
are not directly related. To overcome this, we create a beam map by sweeping bars 
of light across the array both vertically and horizontally. This allows us to locate 
the MKIDs on a Cartesian grid. 


1.5.2. Cryogenic wiring 


MKIDs require wide bandwidth microwave signals to be sent from room temperature 
to the MKIDs at 100 mK, then amplified and returned to room temperature. This is 
a challenge, especially for adiabatic demagnetization refrigerators (ADRs) with low 
cooling power. Most low temperature systems use NbTi coaxial cables with SMA 
connectors to get the signals to and from 4K, and stainless steel coaxes to move 
between 4K and 300K. As MKID array sizes grow, the size and heat load of these 
cables becomes a serious design problem. 

One solution is switching from coaxial cables to flexible planar cables using 
microstrip, CPW, or stripline transmission lines. These interconnects, pioneered at 
UCSB,*° use NbTi on Kapton microstrip cables with Corning Gilbert G3PO blind 
mate connectors to move signals from 100mK to 4K, Low Noise Factory 4-8 GHz 
HEMT amplifiers with G8PO input and outputs, and copper on Ultralam stripline 
flex cables to move the signals from 4K to 300K. Figure 4 shows an image of the 
NbTi flex cables. 


Fig. 4. A 5-conductor NbTi on Kapton microstrip flex cable with G3PO connectors, made at 
UCSB. 
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2. Digital Readouts for MKIDs 


The primary benefit of MKIDs over other superconducting detector technologies 
is the built-in frequency domain multiplexing and the fact that all the complex 
electronics are located at room temperature. This allows MKIDs to benefit from 
the rapid advances in ADCs and field programmable gate arrays (FPGAs) driven 
by the communications industry, allowing pixel count to scale with Moore’s Law. To 
read out a MKID, one must first generate a comb of frequencies with each sine wave 
tuned to the ideal frequency and power to properly bias the corresponding resonator. 
This comb is sent through the MKID, then digitized and sent to a FPGA. Inside 
the FPGA the individual frequencies are isolated from each other, and changes in 
the amplitude and phase imprinted by photons hitting the MKID are optimally 
filtered, searched for triggers, packaged into 64 bit packets, and sent over ethernet 
to a data acquisition computer?® for further processing.?° 


2.1. Gen2 readout electronics 


The second generation of MKID readout electronics, developed in collaboration 
between UCSB and Fermilab, were commissioned at the Palomar 200-inch tele- 
scope in July 2016. These electronics featured three boards working together — 
a ROACH2 for signal processing, and custom ADC/DAC board featuring 2 GSPS 
ADCs and DACs, and a IF board to translate signals from basebase into the 
4-8.5 GHz range. Figure 5 shows the readout block diagram and a picture of the 
assembled boards. Two sets of boards are loaded onto a cartridge, and power split- 
ters are use to drive a single feedline so that each cartridge can read out up to 
2048 resonators in the 4-8.5 GHz band. Five of these cartridges are installed into a 
single glycol-cooled readout crate, enabling a complete readout for 10 kpix in 10U 
(17.5inch) height in standard 19-inch rack. 
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Fig. 5. (a) A block diagram of the second generation MKID readout electronics. (b) The assem- 
bled readout electronics with relevant parts labelled. Both figures reprinted with permission from 
Ref. 41. 
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3. Current Array Performance 


3.1. Spectral resolution 


The typical spectral resolution for the TiN MKIDs used in ARCONS*? is R ~ 8 
at 400 nm, declining linearly with increasing wavelength as expected. The median 
spectral resolution is further degraded from R ~ 8 with an analog readout to R ~ 6 
with the full ARCONS digital readouts due to limitations in the number of coef- 
ficients (taps) in the programmable optimal filter in our digital electronics. The 
second generation readout electronics improves this artificial degradation through 
the use of improved optimal filters. 

Recent work at UCSB*® has probed the noise sources in PtSi arrays?” using 
a parametric amplifier.44 This work shows that the noise of the HEMT amplifier 
is limiting the spectral resolution of these MKIDs to R & 7.5 at 980nm. With 
the parametric amplifier, this improved to R = 9.6 at 980nm. However, when we 
calculate the expected spectral resolution with the parametric amplifier we expect 
to get R & 25 at 980nm. We believe this excess spectral broadening is due to 
either geometric effects based on photon absorption location in the PtSi inductor, 
or hot phonon escape from the PtSi into the substrate. Both of these mechanisms are 
discussed in detail in the appendix of Ref. 43. We expect future iterations of the PtSi 
MKID arrays to eliminate these noise sources through changes to the fabrication 
process, enabling these higher resolutions in MKID instruments with parametric 
amplifiers. These numbers are still far below the theoretical limits in Section 1.2, 
which leaves plenty of room for improvement. 


3.2. Yield 


The yield of MKIDs is of urgent concern, as missing pixels make accurate photom- 
etry difficult, especially for rapidly time variable sources like compact binaries and 
transiting exoplanets. In one of our typical PtSi arrays, the resonator frequency 
has a dispersion of roughly 8 MHz from the value set in lithography (taking into 
account a uniform global offset). The resonators are supposed to be spaced every 
2 MHz, and each needs 100kHz of bandwidth on either side of the resonator for 
the readout. When two (or more) resonators fall within ~250 kHz of each other, we 
can only read out one resonator. This limit of 250kHz is imposed by the desire to 
track the fast rise time of each pulse. The large dispersion and small mean spacing 
means the resonators are essentially randomly placed in frequency, so the bandwidth 
required for each resonator (250 kHz) and the spacing (2 Mhz) essentially give you 
the observed yield of 90%. While placing the resonators close together physically 
would somewhat mitigate this dispersion, there are electromagnetic crosstalk issues 
that arise if resonators that are physically close together are also closely spaced in 
the frequency domain.*? In practice, we use an algorithm to ensure nearest physical 
neighbor pixels are roughly 50-100 MHz apart in frequency. To improve the yield the 
mean dispersion must be reduced below 1 MHz (or the frequency spacing increased, 
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at the cost of a more expensive readout) so that the resonators are not randomly 
placed. Significant progress likely requires a dedicated production fabrication facility 
with improved tools for thin film deposition and etch uniformity. 


3.3. Quantum efficiency 


The expected quantum efficiency of a bare MKID arises from absorption in the 
metal film that makes up the inductor, shown for PtSi as the red lines in Fig. 6. We 
can improve this quantum efficiency by applying anti-reflection (AR) coatings. AR 
coatings lower the reflectance of an optical surface by creating destructive interfer- 
ence for the reflected light and constructive interference for the transmitted light. 
The performances of optical coatings are thickness sensitive and simulations can be 
carried out to tune the thicknesses of the different layers to optimize the absorp- 
tion into the detector. Common materials for AR coating are SiOz (n = 1.45) and 
TagOs (n = 2.16) because their optical parameters are nearly constant over a large 
wavelength range, which allows nearly perfect impedance matching. The absorption 
of the PtSi (60 nm)/SiO2/Ta2O5 multi-layer is optimized with software (TF calc) in 
the 400-1400 nm band by tuning the thicknesses of the two oxides. The best results, 
with thicknesses of 98nm and 49 nm for the SiOz and Taz2Os respectively, are shown 
in Fig. 6, showing an improvement in QE of nearly a factor of 2 to roughly 75% 
across the 400-1400nm wavelength range. 
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Fig. 6. Absorption of a PtSi film deposited on a sapphire substrate in the 400-1400 nm wavelength 
range (red, lower pair of curves) and absorption of the same film coated with the SiO02/Ta205 
bi-layer (blue, upper pair of curves). Dashed: simulations, Solid: measurements. (See electronic 
edition for a color version of this figure.) 
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Experiments at UCSB have shown that AR coatings, when applied only to the 
photosensitive inductor (using a liftoff stencil), do not affect inductor quality factor 
or TLS noise. 


4. MKID Applications 


MKIDs are particularly useful in situations where fast photon counting without read 
noise is required over modest fields of view and at lower count rates, roughly below 
2500 cts/pixel/s in current MKID instruments. The applications to astronomy are 
broad, with only several of the active concepts currently under exploration detailed 
below. 


4.1. Exoplanet direct imaging 


Direct imaging of exoplanets*’ is one of the most technically difficult approaches 
to studying exoplanets, but holds immense promise for not just detecting but char- 
acterizing planets around the nearest stars. Ambitious instruments at the world’s 
largest telescopes have been built to carry out this science: The Gemini Planet 
Imager (GPI),*° 4” SPHERE at VLT,*8 SCExAO at Subaru,*? >! and the P1640°? 
and Stellar Double Coronagraph (SDC) at Palomar.®* All of these instruments are 
currently limited to star-planet contrast ratios of ~10~°® by a background of speckles 
from uncontrolled scattered and diffracted light. The most problematic speckles are 
mostly short (<1s) timescale atmospheric speckles that are beyond the capability of 
the adaptive optics (AOs) system to remove, likely due to latency or non-common 
path aberrations. Two instruments are under development that directly address 
this problem. The NSF has funded DARKNESS, a fast photon-counting, energy- 
resolving integral field spectrograph (IFS) based on microwave kinetic inductance 
detectors (MKIDs)!® 17:4 for use at the Palomar 200-inch telescope’s SDC (see 
Fig. 7). The Japanese government has funded MEC, a similar instrument for the 
Subaru Telescope’s SCExAO system. DARKNESS was commissioned in July 2016, 
and MEC was commissioned in 2018. 

These MKID integral field spectrographs (IFSs) are capable of producing an 
image cube several thousand times a second without the read noise that dominates 
semiconductor-based IFSs read out at high speed. This unique capability will allow 
us to use the MKIDs as fast focal plane wavefront sensors, eliminating non-common 
path issues and allowing active speckle suppression®” of atmospheric speckles for the 
first time. The time domain information is particularly powerful, allowing stochas- 
tic speckle discrimination (SSD) to reach below the classical photon noise limit 
in differentiating exoplanets from speckles.®° The contrast enabled by SSD and 
active speckle nulling at 2A/d inner working angle (IWA) provided by SCExAO’s 
PIAA®® coronagraph will allow detection of reflected light planets for the first time. 
Demonstration of this contrast would allow us to propose the definitive survey of 
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Fig. 7. Photographs of the DARKNESS cryostat and internal structures. (a) The DARKNESS 
cryostat hanging in its lab frame, including a mock P1640 IFS cradle. (b) Internal view of cryostat 
with 300, 77, and 4K shell removed. ADR unit is attached at bottom of 4K plate and is connected 
to the 100mK detector package with flexible copper straps. (c) Periscope structure for use with 
the P1640 coronagraph. (d) A zoomed-in view of the MKID detector package inside its mounting 
structure, including input signal cables on the right side and HEMT amplifiers on the output side. 
The magnetic shield, 1K baffle, and signal output cables have been removed for clarity. The 1-K 
stage sits atop three carbon fiber supports with the 100-mK stage hanging from it on three Vespel 
SCP-5050 supports. 


planets around nearby stars with a discovery space that will be unmatched until 
30-m telescopes are online in the middle of the next decade. 

Proving this contrast ratio is possible from the ground will be vital for direct 
imaging instruments designed for 30-m class telescopes. A 30-m telescope with a 
~10-® contrast ratio will be able to image Earth-mass planets in the habitable 
zones of nearby M-dwarfs,°’ and the low resolution spectroscopy may be able to 
detect the signatures of life in their atmospheres.°® 
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4.2. Massively multiplexed multi-object spectrometer 


In a conventional multi-object spectrograph, like DEIMOS on Keck,°? a mask is 
inserted at the focal plane so that light from the targets may pass through the slits 
(or apertures), but background sky and other nearby source photons are blocked to 
reduce sky noise and contamination. After passing through the mask, a dispersive 
element such as a diffraction grating or prism is used to spread the light as a 
function of wavelength on a detector. In a SuperMOS, we use the same mask-based 
approach to reduce sky background and contamination from other sources, but 
require no dispersive element, instead using the intrinsic spectral resolution of the 
MKID detectors, as shown in Fig. 8. Since each MKID pixel provides moderate 
spectral resolution the focal plane is used much more efficiently, yielding a simple 
and compact system. 

An example of of this type of instrument is the Giga-z®° concept. It is enabled 
both by the inherent energy resolution and especially by the large pixel counts pos- 
sible with the MKIDs. It is currently envisioned as an instrument for the Cassegrain 
or Nasmyth focus of a dedicated 4-m class telescope. In order to cover 20,000 square 
degrees in a reasonable amount of time, we use a one square degree focal plane. This 
square degree field of view is divided among the 100,000 detectors, each fed by a 
macropixel covering 10” x 10” of the sky. Galaxy number counts in J band to 24.5th 
magnitude®! ensure that most macropixels (80-100%) will contain a galaxy at each 


Fig. 8. An artist’s impression of the operational principle behind Giga-z. 
g 
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pointing. A mask cut using pre-existing LSST (or earlier Dark Energy Survey) 
imaging allows the light from one celestial source per macropixel to fall onto a 
square microlens array (99% fill factor), focusing the light onto the corresponding 
large plate scale MKID located directly below. Depending on the specific camera 
design, a reimaging system incorporating demagnification would likely be required 
between the mask and the microlens array. 

A two year survey with the Giga-z instrument would be capable of returning 
30-60 spectral channels on nearly 2 billion galaxies down to m; ~ 24.5. The wider 
wavelength coverage dramatically reduces the number of catastrophic failures in 
photo-z determination. Giga-z would vastly improve the science return of LSST. 


4.3. Seeing-limited integral field spectrometer 


Time domain astronomy, including both cadence observations and rapidly variable 
sources, is an increasingly large part of astronomy. This trend will only increase 
with surveys like ZTF and LSST. Sources include transients like gamma-ray burst 
afterglows, supernovae, tidal disruptions, EM follow-up of gravitational wave detec- 
tions, and Kuiper Belt Object (KBO) Occultations, as well as more predictable 
time-variable sources like planet transits, optical pulsars, and compact binaries. 

Seeing-limited imagers based on MKIDs, like ARCONS at Palomar and the 
proposed KRAKENS at Keck, are ideal instruments for time domain measure- 
ments. KRAKENS will be a MKID-based, low resolution, high sensitivity inte- 
gral field unit (IFU) with 32,400 (180 x 180) pixels, a 45 x 45 arcsecond field of 
view, spectral resolution R = A/AA & 25 at 0.4m, and an extraordinarily wide 
0.32-1.35 wm bandwidth. The relatively large FOV will place stable calibration stars 
on the array for accurate photometry, as well as enabling observations of larger 
objects like nearby galaxies and galaxy clusters. KRAKENS is planned as a facility 
instrument at Keck, with a full software pipeline inherited from ARCONS. More 
information is available in the KRAKENS science case.” 


4.4. High resolution echelle order sorter 


It was recognized over a decade ago that an energy-resolving detector can act as 
an order sorter for a high resolution echelle spectrometer, obviating the need for a 
cross disperser and resulting in a compact and efficient system.®? Preliminary design 
studies based around this principle have been performed for MKIDs; an example can 
be found in Ref. 64. This technique requires the orders to be sufficiently separated 
in photon wavelength so that the chances are low of mistaking a photon in one 
order for another order. Since the resulting spectrogram is linear, it is relatively 
straightforward to make a system that disperses the light from many fibers, creating 
a compact high resolution read noise-free multiobject spectrometer. For example, 
a system with 800,000 (8000 x 100) MKID pixels could simultaneously take 100 
spectra with R > 75,000. 
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This kind of instrument would be especially useful in the search for molecules 
in the spectra of planets around nearby stars, using the cross-correlation tech- 
nique® behind a coronagraph.°® The combination of the rejection of the extreme 
adaptive optics, advanced coronagraphs with small inner working angles, and the 
cross-correlation technique could potentially allow extreme contrast (~10~!° after 
post processing) for detection of Earth analogues from 30-m class ground-based 
telescopes. 


5. Future Directions 


MKIDs for UVOIR astronomy are just scratching the surface of their potential 
in terms of per-pixel performance and the complexity and power of the instru- 
ments they are integrated into. MKIDs are rather straightforward to fabricate in 
large arrays as they are internally self-similar, lending themselves to standard pho- 
tolithography with an optical stepper, as shown in Fig. 3. As array sizes grow the 
challenges associated with reading out the arrays increases. A Megapixel array will 
need ~250 microwave interconnects (125 feedlines) to bring signals to and from 
300K to 100mK with low heat load. To accomplish this, new compact cryogenic 
wiring solutions are needed. Reading out MKIDs requires complex room temper- 
ature electronics, and constantly improving digital electronics using cutting edge 
ADCs and FPGAs will continue to reduce the size, cost, and power consumption of 
the readouts. 

Megapixel MKID arrays will be extraordinarily useful detectors for future astro- 
nomical instrument from the ground and from space, and will likely find use in other 
scientific disciplines were every photon matters. 
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Spectral filters are components that allow astronomical instruments to isolate a 
range of wavelengths for observation. Their widest use is in photometry/imaging, 
where they are used to divide the spectrum into specific bands for measuring band- 
to-band intensity differences or the strength of particular emission lines, such as 
Ha, all of which can provide important astrophysical quantities associated with 
the object of interest. But they are also used in grating spectrographs to separate 
orders, and in spectrographs and cameras to divert different wavelength ranges to 
different detectors. Some filters work by selective absorption and transmittance: 
these are the colored glass filters. Others work by selective reflection and trans- 
mittance: these are usually referred to as interference filters. The first filters were 
made from colored glasses, some of which have been available since historic times, 
but these days, interference filters, made through computer design of multiple thin 
film layers applied to glass or crystalline substrates, are increasingly being used. 
In this review we will discuss the historical development and uses of filters in 
astronomy, the range of colored glass filters, the dimensions available, and their 
advantages and disadvantages. We then outline the great advances in thin film 
technology, and its applicability to a range of astronomical projects involving wide 
field imaging. We will also mention the major industry players currently involved 
in astronomical filter manufacture. 


Introduction 


The earliest astronomical observations were done using the observer’s eye as the 
detector. However, the introduction of more sensitive and different wavelength 
response detectors over the last century! enabled different parts of the electro- 
magnetic spectrum to be measured. To make use of these advances, so-called stan- 
dard photometric systems? were developed based on bandpasses defined by colored 
glass or gelatin filters combined with the overall detector response and atmospheric 
transmission. Because the methodology was to use the fluxes of a selected group of 
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stars themselves as standard candles (rather than laboratory sources), there was a 
tendency to continue to use existing photometric systems (made with older detec- 
tors) even when advances in detector technology were made. Hence, the historical 
broadband systems, such as UBV, VRI, JHKL persisted, and a lot of effort has 
gone into matching the old passbands by modifying the defining colored glass filters 
for use with modern detector responses. Interference filters were introduced in the 
1950s to provide narrower passbands to targeted specific absorption features in stars 
in an effort to improve the precision of stellar parameters, but the technological 
restrictions to small dimensions in the early days of interference filter manufacture 
precluded their use in wide-field imaging. The advent of wide-field telescopes and 
cameras, the development of CCD mosaics to cover the large fields and the increas- 
ing interest in the structure of the universe through mapping the 3D distribution 
of galaxies led to new broadband photometric systems, such as the SDSS u’g’r’i’z’ 
system,® that can be used to measure the regression velocities of galaxies through 
photometric redshifts. In this technique, a strong absorption or emission feature in 
the spectrum causes an appreciable change in the magnitude of a galaxy as the fea- 
ture moves into a different passband filter depending on the galaxy’s radial velocity. 
The filter requirements for these ongoing and future digital sky surveys in terms 
of passband uniformity, stability and filter size are very challenging, especially as 
surveys are extending to more distant galaxies and observations out to the 1000-nm 
region. 


2. Colored Glass Filters 


The first filters used in astronomy were colored glass (Schott or Corning) or colored 
gelatin (Kodak). These were used with photographic emulsions to define B, V, R, 
and I bands. Colored glass filters are now available from Schott (Germany and 
USA), Hoya (Japan) and Yinxing (China), LZOS (Russia) and PG&F (Ukraine). 
More details of these companies is given in Appendix A. These companies’ products 
are broadly similar, and they provide cross-indices of each others products. How- 
ever, production line processing and commercial imperatives dictate more restrictive 
dimensions and fewer kinds of filters than in the past. The Schott range is still 
the widest and they are able to provide large dimension colored glass filters for 
astronomy (up to 800 x 800 mm for longpass filters) upon special request.*°° LZOS 
and PG&F still offer blanks up to 400 x 400mm in size, although as for Schott, 
exact dimensions depend on the glass type and specific requirements as to the glass 
quality, such as low wavefront distortion and high homogeneity. In the following 
discussion, we will use Schott names.* 


* https: //www.schott.com/en-us/products/optical-filter- glass 
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Fig. 1. Transmittance of Schott optical filter glasses. Figure from Schott Catalogue, used with 
permission, https: //www.schott.com/en-us/products/optical-filter-glass/technical- details 


Colored glasses® are produced in one of two ways, either by ionic coloration 


or by absorption and scattering from a suspension of colloidal particles (such as 
Cd, Te, Se) that are produced in the glass and controlled in size by heat treatment 
after an essentially colorless glass is made. The ionic glasses are made by dissolv- 
ing particular salts (such as cobalt or nickel oxide) in the glass. In Fig. 1, some 
transmittance curves of these glass filters are shown. 


2.1. Bandpass filters 


These are the violet (UG), blue (BG) and green (VG) filters. The ionic coloration 
of the UG (violet) and BG (blue) series of glasses produces a spectral transmittance 
curve resembling a bell-curve of half-width between 100 nm and 200nm but all of 
these glasses also transmit red light beyond 700nm as well; thus, the violet glasses 
appear purple to the eye. When used with broad sensitive detectors, such as CCDs, 
the red leak must be blocked. There is a special family of blue bandpass filters, 
such as $8612, BG39 etc, that do not have red leaks below 1000nm. They serve as 
excellent red-leak blockers for CCD observations. 
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2.2. Longpass edge filters 


These are the very useful series of sharp edge (or short wave cut-off) filters, the 
N-WG, GG, OG and RG series which absorb light blueward of a quite sharply 
defined wavelength. The N-WG provide cut-offs between 300nm and 320nm. To 
the eye, the N-WG glasses are essentially colorless. The GG and OG filters are 
made with short wave cut-offs ranging from 400nm to 700nm in steps of about 
20nm. The GG series go from colorless through light yellow to dark yellow while 
the OG are orange-red. The RG series are rose to ruby in color and provide a 
few shortwave cut-off options between 700nm and 950nm. Any particular cut-on 
wavelength is possible to be made, but historical and commercial factors determine 
the current set of cut-on wavelengths. 


2.3. Shortpass, far red absorbing filters 


The KG series transmit light between about 350nm and 700nm and absorb light 
redward of that limit. (One use for them is as heat absorbing filters in projectors.) 
These glasses (and $8612, BG18/39/40) have long wave cut-offs that fall more 
slowly (over about 250nm) than the short wave cut-offs (over about 50nm) of the 
longpass filters, thus producing an asymmetrical spectral transmittance curve when 
used together to define a bandpass. Some of the BG shortpass filters do transmit 
light again in the near-IR, between 1000nm and 3000nm. The Schott catalogue 
provides transmissions for their filters out to 5000 nm. 


2.4. Combination colored glass filters 


The bandpass glasses define only a few broad bandpasses in the ultraviolet, blue 
or green. However, by combining two or three colored glass filters, a wider range of 
passbands can be made. The Johnson B band was derived by excluding ultraviolet 
light from a blue Corning filter (like BG12) with GG385. The Cousins R, band 
was derived by combining a longpass GG590 filter with a KG3 filter. The UBV 
system was originally devised with a photomultiplier tube (1P21) with little red 
response so the red leaks of the ultraviolet and blue glass filters were normally of 
no consequence, but when red sensitive detectors were used, the red leaks needed 
to be blocked. The special nearly colorless bandpass glasses $8612 or (BG39/40) 
(see Fig. 1) are ideal as red leak blockers. Furthermore, with a judicious selection of 
glasses and thicknesses, it is possible to derive combination glass filters to replicate 
the UBVRI system with CCDs.’ However, a great flexibilty in passband derivation 
comes from combining one of the longpass edge filters with a shortpass (longwave 
cut-off), specially designed, edge interference filter (see Section 4.2 and Fig. 3). An 
added advantage is that the shortpass coating can be applied to the colored glass 
substrate. The techniques of handling and glueing large glass combination filters 
have been described in, e.g., Ref. 8. 
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2.5. Summary of pros and cons of colored glass filters 
2.5.1. Pros 
The advantages of colored glass filters are as follows: 


e Excellent quality control of different filter batches — reproducible transmissivi- 
ties. 

e Transmitted wavelength does not change with incident angle. 

e Can combine a bandpass glass filter and a longpass glass filter to define a narrower 
band, and glue to a clear substrate to match a design filter thickness. 

e Wide range of coatings can be applied (e.g., anti-reflection, shortpass or longpass). 

e Maximum stock glass sizes up to 165 x 165 mm; however, Schott can provide larger 
sizes up to 800 x 800 mm upon request for some glasses. Sizes up to 400 x 400 mm 
are available from LZOS and PG&F. However, exact dimensions depend on the 
glass type and specific requirements as to the glass quality. 

e Cheaper than interference filters. 


2.5.2. Cons 
On the other hand, colored glass filters have the following disadvantages: 


e The palette of the bandpass (BG, UG) glasses is very small and restricted. 

e The red cut-offs of the BG and KG bandpass filters are shallow compared to the 
steep cut-ons of the glass longpass filters. 

e Some glasses, e.g., UG2 and BG12, are no longer made and the range of glasses 
offered by the various manufacturers is shrinking. 

e The recommended BG and UG passband filters are also often very thin (~1 mm), 
increasing the difficulty of polishing large sizes. However, Schott has recently 
introduced BG64 and UG2A, which can be used where thicker dimensions are 
desirable. 

e Most of the bandpass glasses have a high concentration of ionic salts in the glass, 
making them brittle and hard to polish as well as more likely to tarnish in air 
unless overcoated. 

e Some vendors are reluctant to coat colored bandpass glasses or put epoxy glued 
filters into their coating plants. 

e The cut-on wavelength edge of the colored glass longpass (colloidal) GG, OG, 
RG filters can irreversibly shift to the red when exposed to high temperatures 
(>250°C) during the polishing or coating processes. 


3. Interference Filters and Thin-Film Optical Coatings 


Interference filters work by selectively transmitting light between specific wave- 
length limits and reflecting other wavelengths. They are made by coating an optical 
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or infrared substrate with multiple thin layers of dielectric (or metallic) materials 
having different refractive indices. The interference effects between the incident and 
reflective light at the thin-film boundaries define the shape and wavelength limits of 
the transmitted and reflected beams. There are two excellent reference books.” 1° 
Interference filters have been used in astronomy since the 1950s. These early fil- 
ters were generally restricted to small diameters (<50mm) due to existing coating 
plant limitations, and could not be used in the ultraviolet because of the opacity of 
the silver films that were often used. However, through substantial investment from 
industry, as well as interest from large telescope consortia, significant advances have 
been made in computer aided multilayer design and computer-controlled thin film 
deposition equipment, which has enabled a wide range of custom-made interference 
filters of exceptional quality, size, and wavelength coverage. For discussion purposes, 
interference filters can be separated into shortpass and longpass edge filters, and 
bandpass filters. 


3.1. Bandpass filters 


Bandpass interference filters are made by coating a glass or fused silica substrate 
with alternating layers of /4 reflective and \/2 transmitting dielectric materials 
to make a single or multiple Fabry—Perot interferometer. The spacings between the 
layers is designed to enhance the transmittance at the wavelength of interest. By 
varying the reflectivities, the number of layers, and the number of Fabry—Perot 
cavities, the shape and width of the bandpass can be altered. The more cavities, 
the squarer the bandpass is. 

Bandpass interference filters also transmit at wavelengths away from the speci- 
fied central wavelength. In Fig. 2, the transmission of an actual Ha bandpass filter 
design with a 12-nm width is shown. The extraneous transmissions shortward of 
about 570nm and longward of 750nm must be blocked. This can be achieved by 
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Fig. 2. Transmission of a 12-nm wide Ha filter design. 
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Fig. 3. Transmittance of design shortpass edge filter and RG630 longpass filter (dashed line). 


adding a longpass and a shortpass filter, which can be either colored glass filters 
or multilayer edge filters or a combination of both, as shown in Fig. 3, where a 
shortpass coated design is combined with an RG630 glass filter, thus isolating the 
12-nm Ha bandpass. The transmission beyond 1100nm was not a problem because 
the CCD was not sensitive at this and longer wavelengths. 

In the past, interference filters were often produced by sandwiching the band- 
pass filter with the two blocking filters and because of the hygroscopic nature of 
the coatings, the edges were sealed with epoxy. Modern interference filters are often 
made with hard coatings that are less affected by the environment, but as a result 
the coatings can stress and bend a large substrate. Such distortions are alleviated 
by placing another coating, such as the shortpass filter coatings, on the opposite 
side of the substrate to the bandpass coating. 

The bandpass of an interference filter shifts blueward as the incident angle devi- 
ates from the normal. This can be useful as it enables a blueward tuning of the filter 
to be done. Typically a 5-degree tilt shifts the central wavelength of a 656-nm filter 
by about 0.7nm while a 10-degree tilt shifts it by 2.7nm. As most imaging is done 
with a converging beam, the bandpass will broaden and shift blueward depending 
on the f/ratio of the telescope and position in the field. It is necessary to take this 
into account when designing narrow band imaging filters. Temperature changes will 
also shift the bandpass, as the distance between the layers changes. Typical optical 
interference filters at room temperatures show shifts of about 0.02 nm per °C.» This 
is about 6 times less than a typical glass longpass filter, such as GG395 or OG590. 

Square bandpasses are desirable for emission line photometry so that the 
response is uniform across the field (changing angle) and for a range of radial 


https: //marketplace.idexop.com/Frontend/PDFs/interference_filter_coatings. pdf 
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velocities. But for some astronomical purposes, such as broadband continuum pho- 
tometry, square bandpasses are less desirable as they produce too great a response 
difference as a particular spectral feature moves in and out of the band due to radial 
velocity differences or chemical and physical differences between objects. Similarly, 
square bandpasses can also result in big differences when a filter is replaced by 
another whose bandpass is not precisely the same.!! For that reason, combination 
glass filters, or glass and interference shortpass edge filters, have been preferred for 
continuum photometry. However, rapid advances in coating technology, including 
rugate filter technology (see below), now enable arbitrary shaped bandpasses with 
high throughput to be made. It is possible that in the future, such interference filters 
could increasingly replace colored glass filters in astronomy. 


3.2. Edge filters 


Unlike bandpass filters, edge filters do not contain cavities. Generally they consist of 
quarter wave and modified quarter wave layers that produce an “edge” in wavelength 
space as the filter changes from reflection to transmission. Figure 3 shows an example 
of a typical short pass filter. Shortpass edge filters are often used with longpass glass 
filters to define astronomical photometric bandpasses. Most of the Sloan Digital Sky 
Survey filter passbands? and the Skymapper filter passbands® were produced in this 
way. Shortpass and longpass edge filters are also often used to block the out-of-band 
transmission of narrow-bandpass interference filters. 


3.2.1. Dichroic mirrors 


Dichroic mirrors are longpass (or shortpass) edge filters, usually used at an angle of 
45°. They are used in astronomy to divert different colored light from a telescope 
into different instruments: for example, to reflect IR light to an IR instrument and 
transmit the optical light to a guider. In optical spectroscopy they are normally 
used to reflect light blueward of some limit into one spectrograph and transmit the 
remainder to others. Figure 4 shows the transmission of three dichroic mirrors used 
in the Wide Field Spectrograph (WiFeS) double-beam spectrograph’? on the ANU 
2.3m telescope. 


3.3. Rugate filters 


A rugate filter is a thin film coating in which the refractive index of the coating 
varies continuously through its thickness, rather than in steps, as in the traditional 
coatings. The gradient index profile can be fabricated in a number of ways includ- 
ing co-deposition or modulation of film density. The simplest example is a structure 
with a sinusoidal oscillation of the refractive index, leading to reflection in some 
narrow wavelength region. In transmission, one obtains a notch filter, which blocks 
some limited wavelength range, while in reflection one obtains a bandpass filter. By 
controlling the refractive index variation, any arbitrary transmittance profile may 
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Fig. 4. Transmission of WiFeS dichroics. The reflectivity in the blue is the complement of these 
curves. 


be designed, for example, by combining various sinusoids using Fourier theory.'? 
Multiple passbands can be produced, for instance to transmit two emission lines of 
interest, or to reject certain wavelengths, such as the strongest night sky emission 
lines or intense laser lines for safety goggles. Promising sensitivity improvements of 
a factor of 3-5 seems possible in J band imaging with a 5-band J filter,!4 but manu- 
facturing limitations make it impossible to make a filter with a sufficient number of 
layers or notches to suppress a much larger numbers of OH lines with rugate filter 


techniques.!° 


3.4. Linear variable filter 


These are novel filters in which the dielectric multilayers are applied with linearly 
variable thickness across the substrate, resulting in a filter where the center wave- 
length of the filter shifts linearly across its length. Such filters are not used much 
yet in astronomy but have important uses in laboratory spectroscopy and satellite 
imaging of the Earth.!© 


3.5. Anti-reflection coatings 


Typical glass surfaces reflect about 4% of the normally incident visible and far-red 
light. Silicon and germanium reflect over 60%. To avoid light loss and problems 
associated with reflected stray light from bright stars and the sky, anti-reflection 
coatings are usually applied to all optics including the filters. Multilayer coatings 
can be designed to produce extremely low reflectivities over a wide wavelength range 
but for filters, a single layer coating designed for the central wavelength of the filter 


64 M. Bessell & G. Bloxham 


is usually adequate. The most effective single layer is a \/4 thickness of material 
whose refractive index is the square root of that of the substrate. 


3.6. Neutral density filters 


Schott make a series of colored glass neutral density (NG) filters to attenuate the 
overall light flux (see Fig. 1). These absorptive filters are not very neutral, espe- 
cially at high densities, and do not transmit in the ultraviolet. Much better are 
the reflective inconel-coated fused silica filters offered by various vendors such as 
Melles Griot, Thorlabs, Newport and Edmund Optics. This metal—alloy coating has 
a transmitted spectral curve that is more uniform over a wide wavelength range, 
from 200nm to 2500nm, and the coating is very robust. The maximum stock size 
is 50 x 50mm or 50mm diameter and densities range from 0.1 to 3.0.° 


3.7. Deposition techniques 


It is of interest when discussing interference filters to note the various techniques 
used to produce the coatings, and the resulting differences in performance and 
stability.¢’° A big issue with interference filters is the uniformity of the coating 
across the aperture. In general, the larger the coating plant the better the unifor- 
mity, although the geometry between the coating source and the substrate is also a 
limitation. 


3.7.1. Evaporative deposition 


Traditional evaporative coatings take place in a high-vacuum chamber where a low 
energy cloud of coating material is vaporised by an incandescent filament or an 
electron beam and condenses onto the rotating substrate(s). A quartz oscillator 
within the chamber is used to monitor the thickness of the coating. To aid adhesion 
of the coating, the substrates generally need to be heated to between 150°C and 
400°C. The resulting films are porous with column-like microstructure and can 
expand or shrink, depending whether the filter is used in air or under vacuum, 
shifting the transmitted passband by up to 1.5% of the wavelength. 


3.7.2. Ion-assisted deposition 


To make the films more dense and stable, they can be bombarded with high energy 
(100eV) oxygen and/or argon ions during the evaporative process. This is called 


“http: //www.edmundoptics.com.au/optics/optical-filters /neutral-density-filters /uv-vis-neutral- 
density-nd-filters /2332/#f=categories_s—*C411* ,productId_i—2332 
“http://www.opticsbalzers.com/en/284/Coating- Technologies. htm 
©http://www.photonics.com/EDU/Handbook.aspx?Tag=Materials%7cCoatings& AID=42399 
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ion-assisted deposition (IAD). The ion beam can also be used to clean the substrate 
before deposition. A big advantage is that the substrate need not be heated during 
deposition to improve adhesion. 


3.7.3. Ion beam sputtering 


Ion beam sputtering (IBS) is a room temperature deposition process under ongoing 
development. Sputtering occurs when a high energy ion beam collides with a target 
material and through a transfer of kinetic energy causes the ejection of the target 
(metal or oxide) atoms from the target surface. The ejected materials then travel 
and deposit on the substrate surface. The high energy of the sputtered species 
results in dense, stable films. The highly repeatable sputter actions at the target 
surface and the high energy of the released atoms makes the sputtering process 
more controllable than evaporation, providing highly predictable thin-film optical 
characteristics. The films are also spectrally stable due to the absence of porosity. 
The rate of deposition is quite low. The biggest drawback of IBS is that it works 
only with a limited range of materials, typically metal oxides. Most of these are not 
transmissive below 250 nm or above about 5000 nm. 


3.7.4. Magnetron sputter deposition 


In magnetron sputtering, the target material is held at a negative voltage and 
immersed in a magnetic field. A plasma is created above the target, and ions are 
accelerated toward the material due to its negative potential. Ion collisions with 
the target cause material to be sputtered off. Secondary electrons are also emitted 
from the target; these are trapped by the magnetic field and serve to maintain the 
plasma through ionizing collisions. The rate of deposition is higher than with IBS 
but the coating uniformity and predictability of coating is lower.1” However, “closed 
field” magnetron sputtering has minimized these limitations, at least for multiple 
small scale articles,!® and ongoing technical improvements are greatly improving 
the uniformity for large dimension filters. 


3.8. Summary of pros and cons of optical interference filters 
3.8.1. Pros 


Optical interference filters have the following advantages: 


e Can make very narrow filters or very broad filters. 

Can have extremely high transmission. 

Can make arbitrarily shaped passbands (rugate filter). 

Can use shortpass edge coatings with glass longpass filters. 

Coating technology is continuously evolving so quality will improve and costs 
could drop. 
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3.8.2. Cons 


On the other hand, optical interference filters have the following disadvantages: 


e Expensive compared to colored glass filters. 

e Still difficult to make uniform across a large aperture. 

e Passband changes with f/ratio of telescope and therefore must be tailored to a 
particular telescope/instrument. 

e Passband may change across field due to changing ray angle (depends on optical 
design). 

e Bandpass filters generally have steep edges (not necessarily good for continuum 
photometry). 

e May have unstable passbands unless hard, dense coatings (IAD or IBS). 

e May be difficult to duplicate a specific filter in a later coating run, compared to 
replacing a colored glass filter. 


4. Infrared Filters 


The near-infrared (700-2500 nm) and the infrared (2500—15,000 nm) are wavelength 
regions with additional problems and challenges for instrumentation, detectors and 
filters compared to optical wavelength regions that we mainly discussed in previous 
sections. At wavelengths redward of about 1600 nm, filters and optics generally need 
to be maintained in a cryogenic vacuum environment and common optical substrates 
such as BK7 and fused-silica are no longer transparent. Crystalline material, such 
as CaF2, MgF» or sapphire can be used in the optical and up to about 8000 nm, and 
Schott offers a range of infrared chalcogenide glasses (IRG22-27) for use between 
700 nm and 12,000nm.! The ISP Optics Catalogue presents the properties of a range 
of useful near-IR and IR glasses and crystalline substrate material. The HgCdTe 
imaging sensors of choice require IR filters to have a very broad blocking range 
from 400nm to 2700nm, sometimes beyond 3000nm. Different dielectric materials 
are usually used for the thin film coatings, and the manufacture of IR filters often 
involve guarded proprietary processes. The particular challenges of manufacturing 
near-IR filters are well described by references in footnotes h and i. 

Excellent near-IR filters on fused silica substrates have been made with mag- 
netron sputtering and ion assisted deposition technology by Asahi Spectra for 
Mauna Kea Observatory as shown in Fig. 5. 


‘https: //www.schott.com/en-us/products/ir-materials/product- variants 

®https: //ispoptics.com/technical/optical-materials/ 
bhttp://www.ifa.hawaii.edu/~tokunaga/filterSpecs.html 

‘http: //www.ast.cam.ac.uk/~optics/dazle/publications/5494-64-dazle.pdf 
shttp://www.asahi-spectra.com/opticalfilters/astronomy_ir.html 
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Fig. 5. Asahi Spectra’s realization of Mauna Kea Observatories (MKO) Y, J, H and Ks filters. 
Figure used with permission. 


5. X-ray Filters 


Photometric observations of the X-ray and EUV bands require filters of a very 
different kind because glass or crystalline substrates cannot be used. The Luxel 
Corporation manufactures ultra-thin plastic and metal-film foils coated with a vari- 
ety of materials to define different passbands. Polyimide film with a thin coating 


of Al is currently the preferred substrate as it is strong and rejects UV and optical 
light.19:* 


Appendix A 


A.1. Colored glass filter manufacturers 


Schott: https://www.schott.com/advanced_optics/english/syn/advanced_ 
optics/products/optical-components/optical-filters/ 
optical-filter-glass/index.html 


Hoya: https://hoyaoptics.com/colored-glass-filters/ 


Yinxing: http://www. ygofg.com/products/ 


Yinxing, Schott and Hoya equivalences are listed in http://www. ygofg.com/ 
Products/121.htm. 


https: //luxel.com/wp-content /uploads/2013/12/Luxel- SPIE-2012-Improved-IR-Blocking.pdf 


68 M. Bessell & G. Bloxham 


Schott provides an Optical Filter Glass Calculation Program (downloadable 
Excel spreadsheet) to design combination filter passbands: 
https://www.us.schott.com/advanced_optics/english/knowledge-center/ 
technical-articles-and-tools/index.html. 

PGO offers an on-line transmission calculator for Schott filters at https: //www. 
pgo-online.com/intl/schott-filter-calculator.html. 


A.2. Interference filter manufacturers 


There are many commercial vendors making interference filters and many more 
involved in thin-film coatings for industry. Small filters for astronomy are made 
by Edmund Optics, Optics Balzers, Omega Optical, Custom Scientific and others. 
The specialized manufacturers of optical and infrared filters for large telescopes and 
imagers are 


Materion: http: //materion.com/Products/PrecisionOptics/Precision0ptic 
alFilters.aspx 


Asahi Spectra: http: //www.asahi-spectra.com/opticalfilters/astronomic 
al_filters.html 


Schott: https://www.us.schott.com/advanced_optics/english/products/ 
optical-components/optical-filters/interference-filters/index.html 


REOSC (a subsidiary of Safran) specializes in IR coatings: http://www. 
safran-electronics-defense.com/company/reosc 


CILAS: https://cilas.ariane.group/en/activities/optical-coatings/ 

Materion made the 800mm diameter curved filters for the LSST project 
in their large optics facility that can coat filters up to 1.4m in diameter. 
http://materion.com/~/media/Files/PDFs/Precision-Optics/Astronomical% 
20Filters-11-13-final.pdf 

Asahi Spectra has made filters up to 620mm in diameter for wide-field imagers 
in Arizona, Hawaii and Chile http: //www.asahi-spectra.com/opticalfilters/ 
large_filters.html, and near-IR JHK-s filters for Mauna Kea Observatories. 

Schott has described some of their techniques and history of interference filter 
manufacture.2? Schott Suisse SA is making the very challenging 53 overlapping 
narrow-band (FWHM = 1.45nm) 101.7 x 96.5mm filters for the 3-diameter field 
of view Observatorio Astrofisico de Javalambre 2.55 m telescope in Spain; details of 
the filter construction are also provided.?!:?? 

Two specialized manufacturers of optical linear variable filters are 


Schott: http://www.us.schott.com/advanced_optics/english/download/ 
schott-veril-may-2013-us. pdf 
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Delta Optical: http://www.deltaopticalthinfilm.com/launch-of-linear- 
variable-bandpass-filters-for-hyperspectral-imaging/ 

Soft X-ray and EUV filters and materials are manufactured by the Luxel Cor- 
poration https://luxel.com/products/filters/standard-filters/. 
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Ultraviolet (UV), visible, and near-infrared (NIR) astronomical instruments 
primarily rely on the use of dispersive elements to conduct spectroscopic obser- 
vations. In this chapter, we review the current use of dispersive elements in 
astronomy, including prisms, grisms, and plane gratings, with a particular focus 
on areas of new technology developments such as silicon immersion gratings and 
volume-phase holographic (VPH) gratings. 


1. Introduction 


Spectroscopy, the study of dispersed light, is a cornerstone of observation and analy- 
sis of astrophysical targets. While some spectroscopy is achieved using interferome- 
try, particularly at sub-millimeter and radio wavelengths, the majority of ultraviolet 
(UV), visible, and near-infrared (NIR) astronomical instruments make use of dis- 
persing optics in transmission and reflection for spectroscopic observations. These 
range from the continued application of optics such as prisms that have been used 
to study light for hundreds of years to the ongoing and cutting-edge development 
of new types of dispersive elements with improved performance and targeted appli- 
cations. In this chapter, we will review the dispersive elements used in astronomical 
instrumentation, beginning with an examination of prisms and grisms, and moving 
to reflection gratings and volume-phase holographic (VPH) gratings. We will pay 
special attention to new research and development, most notably in the form of 
silicon immersion gratings and VPH gratings. 
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2. Prisms and Grisms 


Prisms make use of the wavelength-dependent refraction of light at a transmissive 
boundary between surfaces of different refractive indices to generate a spectrum. 
Since Newton’s Opticks of 1704,! prisms have been used to study the spectrum 
of visible light and characterize its properties. Prisms are used in astrophysics in 
a number of different applications including spectroscopy, polarization, and image 
slicing. Dispersive prisms are typically triangular pieces of glass in which the beam 
passes through two faces of the optic set at an angle to each other, with the beam 
inside the prism parallel to its base. The limiting resolution of the prism is then 
set by its base width (alternately, the apex angle) and the change in the index of 
refraction of the prism material as a function of wavelength; the limiting resolution 
can be increased by combining prisms in series.” In contrast to diffraction grat- 
ings, prisms have the advantage of high throughput and low scatter over the full 
spectral band, the absence of overlapping orders, and straightforward design and 
implementation characteristics. On the other side, prisms are largely limited to low 
dispersion applications, have varying dispersion across their operating waveband 
(although this can be mitigated by using multiple prisms), and become heavier and 
more challenging to mount stably in larger formats.? 

Grisms are manufactured by replicating a transmission grating on one of the 
faces of a prism. They disperse the light but maintain the in-line direction of 
the beam, with the beam angle deviation from the grating diffraction corrected 
by the prism refraction.* Grisms also display fewer aberrations than a transmis- 
sion grating used alone when placed in a converging beam.* As such, grisms are 
well-suited to provide a low dispersion spectroscopy capability in an instrument 
otherwise designed for imaging, often achieved by simply adding grisms to the 
pre-existing instrument filter wheel. However, when using grisms in the through 
beam configuration, multiple orders, including zero order, are imaged in the focal 
plane, which can lead to overlap and confusion between spectra from different tar- 
gets in the field, particularly when used for slitless spectroscopy. 

Because of their simplicity and versatility, prisms and grisms continue to be 
used regularly in astronomical instruments, both for ground-based and space instru- 
ments. These include the use of simple, single optics as well as more sophisticated 
combinations of different glasses to increase dispersion, maintain a constant disper- 
sion across the active waveband, and minimize beam deviation. Prisms are often 
used as cross-dispersers for order sorting in moderate-to-high resolution, cross- 
dispersed echelle spectrographs: examples currently in use include the optical spec- 
trograph MIKE on the Magellan telescope;” the GNIRS NIR instrument at Gemini 
Observatory;® the visible-NIR spectrograph XSHOOTER at the Very Large Tele- 
scope (VLT);’ and the Keck and Magellan echelettes ESI, MagE, and FIRE;® 1° 


‘Richardson Gratings Technical Note 5: https://www.gratinglab.com/Information/Technical_ 
Notes/TechNote5.aspx 
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among others. Such prisms are often used in a double-pass configuration to increase 
dispersion and order separation. With a combination of prisms of different mate- 
rials — e.g., the use of ZnSe and fused silica or Ge and LiF prisms for the NIR 
and mid-IR channels, respectively, of the Infrared Telescope Facility’s (IRTF) SpeX 
instrument!! — the spectrographs can be designed to have nearly constant order 
separation across their wavebands. The rotation angles of the prisms can be adjusted 
for desired instrument performance, such as aligning spectra along detector rows or 
adjusting the rotation angle between two prisms to control the anamorphic beam 
distortion.” !? 

Prisms are also used alone for high efficiency, low dispersion spectroscopy. The 
Magellan PRIMUS survey, for example, deployed a specially-developed, three-glass 
prism in the IMACS instrument to conduct a low resolution (R < 100) spectro- 
scopic redshift survey of faint galaxies.'’ Combined with custom slit masks, the 
survey was able to obtin spectra of ~2500 objects with each pointing. The prism 
delivered 90% throughput and a spectral resolving power of R = 500-30 over 
its 4000-10,000 A waveband. The aforementioned ESI echelette spectrograph also 
includes a high efficiency prism-only mode. On the Hubble Space Telescope (HST), 
both the STIS and ACS instruments contain prisms for low-resolution UV spec- 
troscopy.!*:!5 Instruments using coronagraphs and extreme adaptive optics designs 
to enable direct imaging of faint objects around bright stars are also using prisms as 
dispersers. These include GPI at Gemini Observatory and SPHERE at VLT. The 
GPI Integral Field Spectrograph uses an air-spaced (BaF2/S-FTM16 glasses) zero 
deflection spectral prism to obtain R ~ 45 spectroscopy in the H-band (GPI also 
has a Wollaston prism for polarization measurements).'? In SPHERE, the Infrared 
Dual Imaging Spectrograph channel uses a double-material prism to provide R ~ 50 
spectroscopy, while the Infrared Field Spectrograph channel uses Amici Prisms!® 
to disperse at constant resolution and zero beam deviation across the 0.95—-1.7 um 
waveband.!7!8 

Grisms are widely used to provide a spectroscopic capability in a number of 
imaging instruments. Both the ACS and WFC3 cameras on HST have grisms in 
their UV/optical and NIR channels.>:!9:?° The WFC3 grisms have proven to be 
a popular observing mode, maintaining >12% of the prime HST orbits through 
Cycle 23.21 The Swift UVOT imager also has two UV grisms that see regular 
use for observations including time-series spectroscopy of supernovae and gamma- 
ray bursts, and observations of comets, among other sources.?*’?% In ground-based 
instruments, grisms are used in Keck LRIS and VLT VIMOS, among others.?4 In 
both these instruments, the grisms can be paired with custom slit masks to support 
multi-object spectroscopy without spectral overlap (for a careful selection of slitlets). 
The Subaru FOCAS and VLT FORS and VIMOS instruments also use “holographic 
grisms” in which VPH gratings” are cemented between two glass prisms to maintain 


bSee Section 4 
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the in-line beam.?°:?° Similarly, Nakajima et al.2’ have developed VPH grisms for 
NIR observations with recent prototypes extending to K-band (2.2 wm) coverage. 
A recent review by Deen et al.?8 provides a detailed summary of design consid- 
erations for grism spectroscopy with a particular focus on grisms fabricated using 
silicon lithography techniques for NIR applications. They also note that with the use 
of high index materials such as silicon and cross-dispersed echelette designs, grisms 
can now readily support resolving powers up to R ~ 10+, considerably higher than 
the R ~ 1000 limit of the past. 

As mentioned previously, because of the presence of multiple orders on the 
image for grism observations taken in a slitless spectroscopic mode, confusion and 
overlap of spectra from multiple targets can occur. The wavelength calibration of 
the spectra also depends on the position of the direct image in the field. As a result, 
the science return of grism observations often relies heavily on the existence of 
good post-processing software tools to extract and calibrate the spectra. For WFC3 
grism observations, for example, spectral extraction and calibration are supported 
by software programs that also support combination of spectra from observations 
taken at multiple telescope roll angles (to mitigate for source overlap).?9:3° 

Looking ahead, prisms and grisms continue to be used for ground and space 
instruments. On the European Space Agency’s (ESA’s) Euclid dark energy mission, 
the NISP NIR instrument will have a grism to obtain low resolution spectroscopy 
and redshifts for millions of target galaxies.*! For the James Webb Space Telescope 
(JWST), the MIRI and NIRSpec instruments contain prisms while the NIRCAM 
and NIRISS imagers include grisms.°2-°° In MIRI, a double prism composed of 
Ge and ZnS maintains the beam direction for use in the filter wheel; the brittle 
materials and stringent angle tolerance for the prisms required iterative machining 
and polishing procedures including the use of ultrasonic contouring machines.*° 
Similarly, the development of the NIR grisms for JWST takes advantage of R&D 
improvements in chemical micromachining of silicon and diamond machining of 
brittle ZnSe for larger devices.2”°* For the Extremely Large Telescopes (ELTs) 
under development, the G-CLEF spectrograph for the Giant Magellan Telescope 
(TMT) will use a grism assembly for its blue arm cross-disperser consisting of a 
VPH erating coupled with two fused silica prisms.°? 

Prisms and grisms are reliable, high throughput performance optics for low dis- 
persion spectroscopic applications. They have an established heritage, with many of 
the aforementioned instruments using designs that are decades-to-centuries old, yet 
when paired with careful selection of glasses and optic combinations, they continue 
to provide useful capabilities to support astronomical spectroscopy. 


3. Reflection Gratings 


For applications requiring higher dispersion than can be provided by prisms 
and grisms, UV/optical/NIR spectrographs commonly employ diffraction gratings. 
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Diffraction gratings make use of the wavelength-dependent diffraction of light and 
interference patterns from multiple, equally spaced grooves (or apertures for trans- 
missive gratings) to disperse the beam.4?:4! The grating equation describes the 
performance of gratings of a given groove spacing used in planar configuration 


sina +sinp =". (1) 


where a is the angle of incidence, 3 is the angle of diffraction, m is the spectral 
order, A is the wavelength diffracted in the direction 6, and d is the groove spacing. 
The grating dispersion is found by differentiating the grating equation with respect 
to wavelength and is a function of the groove spacing, the angle of diffraction, and 
the spectral order 


d\ _ dcosB 
a . (2) 


m 


The maximum resolving power a grating can deliver is R = — = mWN, where m is 
the spectral order and N is the number of illuminated grooves. As a result, higher 
spectral resolution can be achieved by increasing the grating size and/or line density 
or working in a higher order. Alternately, for a given combination of groove spacing 
and spectral order, the dispersion and resolving power can be increased by working 
at higher angles of beam incidence and refraction. Diffraction gratings are typically 
blazed by tilting the angles of the reflective grooves with respect to the incident 
beam to concentrate the light into specific orders. The gratings are then used at 
or near the Littrow configuration (in which the angles of incidence and reflection 
are equal and set at the blaze angle) to maximize efficiency. For more information 
about the properties of diffraction gratings as used in astronomy, see Ref. 2. 


3.1. Mechanically ruled gratings 


Mechanically ruled planar gratings have been the most commonly used dispersive 
elements in optical/NIR astronomical spectrographs, particularly in ground-based 
telescopes. Grating masters are created using diamond-etching to shape triangular 
grooves. The wear on the diamond tip limits such gratings to groove densities up 
to 5400 grooves/mm for 250 x 250mm sizes or 1500 grooves/mm for 320 x 420mm 
gratings.” Fabrication of the masters is a slow and costly process, so replicated 
gratings are cast (in a resin layer on glass, overcoated with metal) from the masters 
for use. 

The types of plane gratings used in astronomy can be broadly divided (with 
inevitable overlap) into two classes: blazed gratings intended to operate in low order 
(sometimes designated echelettes, although modern parlance has shifted to using 
this term mainly for cross-dispersed systems working in a few low orders) and echelle 
gratings. The former generally have higher groove densities (a few hundred to a few 
thousand grooves/mm) and lower blaze angles (<20°) and are optimized for highest 
efficiency in lower spectral orders (often m = 1 or —1). Echelles are manufactured 
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with coarse, deep grating rulings (e.g., 31.6, 79, and 316 grooves/mm are common 
configurations) and are operated at high blaze angles (up to 76°). Echelon gratings 
can support resolving powers up to 1,000,000 for a 10inch grating.4? They operate 
in high spectral orders (as low as m = 5 or up to m > 100), where the free spectral 
range per order is small. Order separation for cross-dispersed designs is accomplished 
using a second disperser: either another ruled grating, a prism, or a VPH grating. 

Mechanically ruled gratings continue to enjoy wide use in astronomical spec- 
trographs. They are most common in ground-based instruments, although STIS 
on HST has a full suite of custom gratings (flat ruled, parabolic focusing, and 
echelle) and JWST’s NIRSPEC instrument has six blazed gratings made from gold- 
coated Zerodur-0.°?:48 Mechanically ruled gratings are generally not an area of 
active R&D, however, with many instruments relying on catalog, off-the-shelf items 
as low risk, long heritage devices with good performance characteristics. Virtually all 
of the visible waveband cross-dispersed echelle spectrographs planned or in opera- 
tion use catalog R2-R4 (63.4°-76° blaze angle) echelle gratings, for example. 4+ °° 
Catalog gratings continue to be used for low-resolution optical spectrographs as 
well, although many of the newer instruments have moved to VPH gratings instead 
(see Section 4). 

Some custom gratings are procured for specific needs. This is particularly com- 
mon for NIR spectrographs which generally use wide groove spacings and low spec- 
tral orders. The width of an order is set by the number of illuminated grooves 
in a grating. As NIR detector sizes have lagged those of optical-band CCDs, a 
lower groove density is better matched to the detector real estate available. Simi- 
lar considerations make use of echelle gratings operating at high blaze angles and 
many high orders more challenging in the NIR. The Keck MOSFIRE and IRTF 
SpeX instruments, for example, are both moderate resolution (R ~ 2000-3500) 
NIR, spectrographs operating in low orders (m = 3-6 or 8). Both use custom 
gratings with relatively coarse rulings (110.5 and 53.9 grooves/mm, respectively) 
and blaze angles < 22°.'':5! For comparison, Mieda et al.°? present the history of 
the diffraction grating development for the Keck OSIRIS NIR spectrograph, which 
employs a unique, very coarsely ruled (27.93 grooves/mm) grating with a shallow 
blaze angle (5.76°) operating in orders m = —3 to —6. Although the grating pro- 
vides high efficiency (78% at 1.3 um), it proved difficult to accurately fabricate the 
grating facets, eventually requiring production of three gratings (and a new vendor 
and fabrication method) before meeting specifications. The authors conclude that 
future instruments would be better served by using a range of more finely ruled 
gratings operating in first order (m = 1). 

Several high resolution NIR spectrographs have been built. The early instru- 
ments such as CSHELL, PHOENIX, and CRIRES used catalog R2 (63.4° blaze 
angle) echelle gratings with 31.6 line/mm groove densities.°? °° 
detector sizes at the time they were built (the pixel sizes of the arrays in the disper- 
sion direction are 256, 1024, and 5500 pixels for the three instruments, respectively), 
they did not operate in a cross-dispersed configuration, instead only recording one 
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order at a time. This limited the simultaneous wavelength coverage and observing 
efficiency of the instruments. The earliest instrument, CSHELL, which is still in 
operation, obtains R = 40,000 over 1—5.5 wm. The simultaneous waveband is lim- 
ited to A\/A = 2.4 x 10-3. PHOENIX can obtain R = 65,000 spectroscopy with 
1.7% continuous spectral coverage at 1 zm. The most powerful (and most recently 
commissioned) spectrograph is CRIRES, which covered 0.95-5.2 um at R = 10°, 
requiring 50 observing settings to cover the full waveband. CRIRES is currently 
being converted into a fully cross-dispersed spectrograph using a focal plane con- 
sisting of three Hawaii 2RG detectors (each 2048 x 2048 pixels); the new instrument, 
CRIRIS+, will have 10x the simultaneous wavelength coverage.°° The process of 
retrofitting cross-dispersers into a pre-existing spectrograph has required dedicated 
evaluation of catalog and custom designed gratings to find optics that will meet 
the performance requirements (>65% average efficiency and >55% throughout) for 
each waveband.°’ More recently, coarse (13.3 grooves/mm) echelles operating at 
very high blaze angles (R5 and R6, ~ 80°) have become commercially available and 
are being evaluated for new NIR high resolution spectrographs.°* °° 

Mechanically ruled gratings with large groove spacings have been fabricated 
for mid-infrared (>2.8 um) applications where the more forgiving tolerances on 
groove precision allow for machining of the required surfaces. A d = 125 jum, 26.75° 
blaze angle grating was fabricated for the LEWIS spectrograph.®°! A prism cross- 
disperser delivered orders 25-37 to the detector, resulting in full L-band coverage 
at R = 1300. The TEXES mid-IR (5-25 um) spectrograph also has a diamond- 
machined grating with a 36 inch length (0.914m) and a 0.3-in (0.762cm) groove 
spacing. It is very steeply blazed (R10, an 84° blaze angle) to maintain a compact 
design. Its highest spectral resolution setting, a cross-dispersed mode that delivers 
5-10 orders to the 256 x 256 array, covers 4/200 at R = 100,000.°?:°3 The EXES 
instrument for the SOFIA Observatory operates from 4.5 to 28.3 um at resolutions 
up to R = 120,000 with 0.7% simultaneous waveband coverage, matched to a 
1024 x 1024 SiAs detector. EXES uses a grating — designated as an echelon after 
Michelson (1898),°?:®* who described a coarse grating constructed of stacked glass 
plates — that was diamond machined in Al 6061 (Fig. 1) and an echelle cross- 
disperser.®° 

As the design of instruments for ELTs is underway, the focus for reflection 
grating development is on procuring large format grating mosaics and on stable 
mount design. (An alternate development path, the use of immersion gratings to 
reduce optics sizes is discussed in Section 3.3.) This is particularly important for 
high dispersion, visible band instruments that will operate most or all of the time in 
a seeing-limited regime, where beam sizes at the grating can be very large: Ref. 66 
notes that the echelle mosaic for a high resolution spectrograph on the 39-m E-ELT 
telescope® would need to be 10-m in length if R = 100,000 were matched to a 
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Fig. 1. The EXES echelon is a 36 x 40 inches (91 x 102 cm) grating with 0.3-inch groove spacing 
line spacing (0.131 grooves/mm). It is operated at a 84.2° incidence angle, reflecting off grooves 
0.03 inches (0.76 mm) in height. The echelon was diamond machined in aluminum. Figure courtesy 
of M. Richter and the EXES instrument team. 


larcsecond slit. Even with the proposed slicing in the image or pupil plane, the 
R4 echelle mosaic is expected to be 1.2m x 0.2m in size. E-ELT HIRES recently 
completed its Phase A design study.®” The first high resolution spectrograph on an 
ELT to be built is the G-CLEF instrument for the 24~-m Giant Magellan Telescope.®° 
G-CLEF will employ pupil slicing using fibers: the high resolution R = 108,000 
modes will use seven fibers to reimage the seven GMT® telescope mirror pairs to 
maintain a beam diameter at the grating of 300 mm.°*° It will use a mosaic of three 
300 x 400mm commercial R4 echelle gratings. The mosaic requires 1 wm parallel 
surface alignment and < 1 ym tip-tilt about the normal to each grating surface, 
requiring development of adjustable shims with sub-micron tolerances to mount the 


grating mosaic within specifications. ” 


3.2. Holographically ruled reflection gratings 


In addition to mechanical ruling, plane diffraction gratings can be holographically 
ruled by etching an interference fringe field on a substrate coated with photoresist; 
the profile created is set by the exposure time on the photoresist. The photoresist is 
then chemically exposed to reveal the fringe pattern and create the grating master. 
Such gratings are known as holographic gratings. The interference pattern created 
by dual monochromatic light sources creates a pseudo-sinusoidal groove profile, 
while the use of a single beam reflected back upon itself results in triangular grooves. 
Triangular grooves can also be generated by ion-etching of a previously exposed 
mask at a variety of incident angles and exposure times; the ion etching process 
completely removes the photoresist and imparts the groove profile directly on the 
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substrate. As the centers of adjacent fringes are separated by a distance d > 4/2, 
high groove densities of up to several thousand of fringes per millimeter can be 
generated for a visible band recording light source, but low groove densities (below 
a few hundred grooves/mm) are not readily fabricated.*? 

There are relative advantages and disadvantages to holographic gratings in 
comparison to mechanically ruled gratings. Holographic gratings with sinusoidal 
grooves are not blazed and their efficiencies are set by the ratio of the groove depth 
to the groove spacing and the incident angle of the beam. Generally, they have 
lower efficiencies than blazed mechanically ruled gratings.”! Holographic gratings 
with triangular grooves can have higher efficiencies but the blaze wavelengths are 
often restricted to short wavelengths, 200-250nm. On the other hand, because 
holographic rulings are free of the irregularities introduced by mechanical ruling 
of grooves, they have much better scattered light performance than mechanically 
ruled gratings. Test data have shown scattered photon levels below 10~® (in scat- 
tered photons/emitted photons/nm).’? Spherical recording wavefronts can also be 
employed to create unequally spaced groove patterns and different curve shapes 
with holographic gratings. This can be utilized to provide focusing capabilities and 
aberration control, especially when combined with curved substrates, which are 
easier to fabricate with holographic etching rather than mechanical ruling.*? 

This combination of properties has resulted in holographic gratings being pri- 
marily used for UV spectrographs in sounding rocket, balloon, and space payloads. 
The ability to create variable groove patterns on curved substrates for focusing 
and aberration correction is particularly useful in the vacuum UV, where relatively 
low reflection efficiencies (<80%) necessitate minimization of the number of optics 
in a system. The low scattered light properties of holographic gratings also prove 
valuable to reduce noise from strong terrestrial airglow lines in the UV, particularly 
Lya. Holographic gratings on curved substrates have been used in the last three 
UV spectrographs fabricated: STIS and COS on HST and the spectrographs for the 
Far Ultraviolet Spectroscopic Explorer mission.*?: "3:74 For the most recent instru- 
ment, HST COS, the gratings were ion-etched to generate blazed groove profiles. 
The COS medium resolution holographic gratings delivered efficiencies of 40-55% 
(groove efficiencies of 50-69% and 80% reflectivity), scattered light performance 
of < 2x 107° /A, and aberration control, including correction for the spherical 
aberration of HST, over their performance wavebands.” The grating surface profile 
of one of the COS FUV flight gratings is shown in Fig. 2. 

The COS gratings performed excellently in ground tests with one interesting 
exception: the moderate resolution holographic gratings for the NUV channel, which 
performed to specification after initial fabrication, were found to have sub-par effi- 
ciencies after application of the anti-reflection overcoats. Analysis revealed that the 
magnesium fluoride coating altered the photon wavelengths at the grating surface, 
moving the grating into a diffraction anomaly region that reduced the efficiency 
in one polarization mode.”“4 Although this can be addressed by redesigning the 
gratings, the expected launch schedule at the time precluded this route. Instead, one 
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Fig. 2. The grating surface profile of a COS FUV grating. The groove profile was developed by 
ion etching of a holographically-ruled grating master. The grating has a 3800 groovemm™! line 
density and a blaze angle of 14.3°.74 


grating was replaced with another with a lower line density, moving the anomaly out 
of the waveband, and two gratings were flown with bare aluminum and no overcoat. 
The latter have shown continual, slow degradation of performance over time that 
is attributed to growth of an oxide layer that is recreating the diffraction anomaly 
on the gratings by depositing a thin dielectric coating. 

More recent work on sounding rocket platforms has focused on developing holo- 
graphic echelles using lithography-etching and e-beam etching techniques; to date, 
however, these methods have suffered from groove etching control issues and have 
not been able to match the efficiencies delivered by mechanically-ruled echelles.7® “7 


3.3. Immersion gratings 
The resolving power of a grating operated in a Littrow configuration is given by 


r _ 2nNdsin 6p 


saa e r 


; (3) 
where n is the index of refraction, N is the number of illuminated grooves, d is the 
groove spacing, and 6g is the blaze angle. Since the resolving power scales with 
n, one can increase spectral resolution by dispersing the light in a medium with a 
higher index of refraction than air. Conversely, a given resolution can be obtained 
with a smaller grating size: the size of the illuminated grating surface (ZL = Nd) 
can be reduced by n and the overall volume by n3. The first demonstration of a 
grating with diffraction in a medium other than air was performed by Fraunhofer, 
who immersed a grating in oil and measured its behavior.*! 

At near-infrared (NIR) wavelengths, materials such as silicon and germanium 
become transparent and can be used as an immersion medium for diffraction grat- 
ings. With indices of refraction of n = 3.4 and 4.0, respectively, Si and Ge immersion 
gratings can be made to provide higher resolution spectrographs in a more compact 
size. Of equal importance is the issue of fabrication: as noted above, high precision 
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gratings with coarsely spaced rulings (<31.6 lines/mm or d > 32 um) are challeng- 
ing to fabricate mechanically, which makes increasing spectral resolution at longer 
wavelengths more difficult, as R « 1/X. As a result, previous high resolution NIR 
spectrographs using mechanically ruled R2 echelles were limited to simultaneous 
wavelength coverage, in the case of CRIRES, for example, of 1/70 to 1/50 of the 
central wavelength in each setting.®°”° However, with the production of lithograph- 
ically etched gratings that facilitate coarse groove spacings and a beam immersed 
in silicon, echelle spectrographs with high resolution and broad wavelength grasp 
can be developed.®° ®? 

Silicon immersion gratings are fabricated using photolithography techniques. 
Monocrystalline silicon is etched using aqueous potassium hydroxide (KOH), taking 
advantage of the fact that the {100} planes are etched significantly faster than 
the {111}, selectively exposing the latter planes of the crystal. By laying down 
KOH-resistant masking material (silicon nitride) at the intersection of the {100} 
and {111} planes using UV photolithography and plasma etching, the anisotropic 
etching creates a groove pattern. If etched parallel to a (100) plane, the angle 
between the planes sets a blaze angle of 54.74°; rotating with respect to the (100) 
plane changes the blaze angle.8°-5% 

The development of immersion gratings for astronomical spectroscopy has been 
the subject of dedicated R&D efforts. This has included developing and refining the 
grating fabrication processes, building construction and test facilities, and working 
to improve key parameters such as throughput, stray light performance, wavefront 
quality, and grating sizes.8° 5° Recent efforts have also focused on improving the 
masking quality to improve positional accuracy of grooves and increase groove den- 
sity. Improvements in the uniformity of the UV exposure dose in the contact mask 
photolithography patterning step have reduced ghosts to ~10~7.87 At the same 
time, efforts are underway to replace the UV contact mask lithography for some 
applications with direct electron beam lithography: the latter is more precise and 
can support line densities up to 500 lines/mm for high dispersion gratings operated 
in low order, but requires longer masking write times, increasing development cost 
and risk.’° Other efforts have focused on etching the gratings on Si wafers and sub- 
sequently bonding the wafer to a prism, taking advantage of industrial equipment 
for photolithography on wafers and the flexibility of decoupling grating and prism 
fabrication.®9: 9° 

Prototype germanium and GaAs (n = 3.4) immersion gratings for mid-IR appli- 
cations with 100-600 zm groove spacings have also been developed using a nano- 
precision grinding/turning machine and the electrolytic in-process dressing (ELID) 
grinding technique.®?9!9? The goal is to develop an instrument for the Subaru 
Observatory that can obtain R = 200,000 spectroscopy at 10 um. However, since 
the machine time for the required 120 x 120 x 270mm grating is several thousand 
hours, new designs are being considered, including the use of an immersion grating 
with oblique deep grooves (1 wm in width and 50-100 wm depth) in which the beam 
will be reflected off the groove sidewalls. The machining time is only several hundred 
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hours for this design. A similar design may be extended to shorter wavelengths by 
moving to ion-etching techniques.” 

The use of immersion gratings is appealing both to increase spectral resolution 
and in applications where mass and volume are an issue: in space, on large-aperture 
telescopes, and for Cassegrain-mounted instruments, for example.5° 9:94 For NIR 
spectrographs, requirements on slit lengths to sample sky background, ideally cou- 
pled with nodding in the slit, drive the need for larger detectors to meet sampling 
requirements. Slit-limited spectral resolution for spectrographs operating in natural 
seeing increases the sizes of optics. Finally, the need for fully cryogenic instruments 
to limit thermal noise places further strain on developing large instruments. As a 
result, the factor of 3.4 increase in the product of slit width and resolving power, 
and the resulting advantage of up to an order of magnitude decrease in instru- 
ment volume, both provided by silicon immersion gratings, are particularly power- 
ful for enabling high resolution NIR spectroscopy with wide continuous waveband 
coverage.?° 

After nearly two decades of development effort, cross-dispersed spectrographs 
using immersion gratings have recently been commissioned on telescopes for sci- 
ence observations. A silicon immersion grating was deployed in late 2013 as part of 
the FIRST spectrograph for the Fairborn Observatory.°° The spectrograph delivers 
simultaneous H-band (1.4-1.8 wm) coverage at R = 50,000 in 59 spectral orders. The 
data show that the grating meets its resolution and waveband coverage performance 
requirements, although there are ghosts present at the 0.7% level, attributed to lim- 
itations in the quality of the photomask when the grating was created. The spec- 
trograph is currently undergoing renovations to improve its performance, including 
development of a new immersion grating. The IGRINS spectrograph was commis- 
sioned at the Cassegrain focus of the 2.7-m telescope at McDonald Observatory in 
the spring of 2014. Using a silicon immersion grating and VPH cross-dispersers, 
IGRINS delivers simultaneous 1.45-2.45 um spectroscopy at R = 40,000. The 
immersion grating is an R3 (71.57°), 36.5lines/mm grating with a 95mm length 
(Fig. 3).°” Lab tests showed that the grating delivers its performance requirements 
of >75% throughput on blaze. Ghosting at the level of 0.2% from periodic position 
errors © 5nm on the groove facets was seen, also within the performance specifi- 
cations. On the 2.7-m telescope, IGRINS can obtain a S/N = 100 in one hour 
on a K = 10.3 magnitude target. IGRINS has been used for more than 699 nights 
on sky, including visitor runs at the Discovery Channel Telescope and McDonald 
Observatory.?? 

Another instrument, iSHELL, has been built for the IRFT telescope and has 
been on the telescope since September 2016. It delivers R = 75,000 spectroscopy 
over 1.1—5.3 um using an R3 silicon immersion grating disperser and a 2048 x 2048 
Hawaii 2RG detector.!°° A selection of cross-disperser gratings allows for coverage 
in different sub-bands, increasing its simultaneous wavelength coverage by a factor 
of 30-60 over its predecessor, CSHELL. In the mid-IR, the GIGMICS instrument 
achieved first light on the 1.5-m telescope at Higashi-Hirosmina.!?! 1°? GIGMICS 
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Fig. 3. Scanning electron micrograph image shows the grooves on an R3 silicon immersion grating. 
The regularity of the groove pattern and the low surface roughness minimize scattered light from 
the grating. Image courtesy of D. Jaffe, C. Brooks, and the IGRINS instrument team. 


uses a germanium immersion grating to provide R = 50,000 spectroscopy over the 
8-13 ym waveband in eight settings. The grating has a 600-~m groove spacing with 
a 68.75° blaze angle and is operated in very high orders, from 332 to 597. Echellettes 
are used as the cross-dispersers. The VINROUGE spectrograph, which will use a 
Ge immersion grating to obtain R = 80,000 spectroscopy from 2.0—5.5 wm, is also 
under design with first light expected in 2019.19° 

Immersion gratings will be a key technology for achieving high resolution NIR 
and mid-IR spectroscopy on the next generation of ELTs. Design efforts are cur- 
rently underway on two instruments using silicon immersion gratings: GMTNIRS 
for GMT and METIS on E-ELT. The GMTNIRS instrument is designed to cover 
1.12-5.3 um at R = 60,000-80,000.9°: 9% The full waveband will be observed simulta- 
neously by five spectrographs, each using a silicon immersion grating as the primary 
disperser. GMTNIRS will be fed by GMT’s adaptive optics (AOs) system, with the 
resolution matched to an 85 milliarcsecond entrance slit. The combination of an AO- 
fed system with the use of immersion gratings keeps the overall instrument volume 
small: the beam size in the NIR will be 25mm and the whole instrument will fit 
within a 2-m long cryostat. Current R&D efforts are focused on manufacturing the 
immersion gratings for the instrument. The JHK gratings will be similar to the 
IGRINS gratings and the LM gratings will be larger, roughly 125 mm in length.?? 

METIS is a first-generation instrument for E-ELT that is designed to provide 
mid-IR diffraction-limited imaging, coronagraphy, and moderate-resolution long- 
slit spectroscopy from 3m to 19 wm and R = 100,000 integral field spectroscopy 


104 


from 2.9 wm to 5.3 ym.* The IFU spectrometer will use a silicon immersion grating 
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echelle; a prism pre-disperser and different grating tilt settings will be used to select 
different spectral orders. The grating specifications are challenging: a classic echelle 
would require a large device (400 x 140 mm) with a coarse ruling (14.4 lines/mm), a 
blaze angle of 65°, ghosting <10~* and a system wavefront error below 100nm RMS. 
The use of a silicon immersion grating reduces the beam size by 50% and the spec- 
trograph volume by a factor of 4 while still meeting performance requirements.!° 
Designs using gratings etched into 150mm wafers with 50 lines/mm groove densities 
bonded to a silicon prism have been pursued to develop a demonstrator immersion 
grating for METIS.1°° 


4. VPH Gratings 


Holographic transmission gratings without any surface relief features can diffract 
light in one of two different ways, by altering either the amplitude or the phase of 
the incident wave. Because they work by attenuating light, amplitude gratings have 
very low theoretical efficiencies, but Kogelnik!®’ has shown that sufficiently thick 
phase gratings have theoretical first order efficiencies approaching 100%, neglecting 
surface reflection losses. Highly-efficient Volume Phase Holographic (VPH) gratings 
consist of a transparent medium with periodic modulations in refractive index that 
diffract light in a manner analogous to Bragg scattering off planes of a crystal. 
Shankoff'® first showed how to fabricate these in films of gelatin by adding a 
sensitizer, typically ammonium dichromate, cross-linking the gelatin by exposing 
it to laser-generated interference patterns, and converting the latent image thus 
recorded into refractive index modulations via processing in a series of dehydrating 
baths. The final result is a layer of hygroscopic gelatin with planes of different 
refractive index that diffract light. This is normally encapsulated between glass 
plates to protect it from degradation by water. Barden et al.' first recognized 
the potential for this technology in astronomy, and since that time many optical 
and infrared spectrographs have been retrofitted with VPH gratings,?? 110 11° 
designed de novo to employ them with full benefit.11® 17° 

VPH gratings offer many advantages over ruled reflection or transmission grat- 
ings, or even etched holographic gratings. They can have very high (>95%) peak 
efficiencies, and, being optically recorded, they are free of the periodic errors that 
cause ghosts in ruled gratings. Moreover, every grating is a “master”, and when 


or 


properly recorded and processed it can have very low scatter. Perhaps the most 
attractive feature of VPH gratings is their capacity for very high line densities, 
because this allows increased resolving power without increased spectrograph pupil 
size (see discussion after Eq. (2)). For example, before the introduction of VPH grat- 
ings, a medium resolution spectrograph for a 4-m telescope typically had a pupil and 
grating size near 150mm to deliver the desired resolving power (R ~ 5000), while 
VPH spectrographs have been built for the SOAR!” and Blanco!?? 4-m telescopes 
with pupil sizes of 75mm and 68mm, respectively. This gain is compounded by the 
Littrow or near-Littrow operation of VPH gratings, which reduces the anamorphic 
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magnification and eliminates the need for even larger camera apertures required to 
accommodate high anamorphic factors. The cost savings of these two advantages 
together is tremendous, because modern astronomical spectrographs for ground- 
based telescopes often employ all-refracting cameras for the highest throughput. 
The large apertures and low f-numbers required for these cameras are primary cost 
drivers, as is instrument volume and mass, which scales approximately as pupil 
diameter cubed. 

There is a limit to the gain achievable from high line density and working angle, 
because for angle of incidence, a, above 45° in air, assuming gratings with refractive 
index around 1.5, polarization-dependent losses begin to compromise performance, 
eventually resulting in complete loss of one polarization state and efficiencies below 
50%. Thus, the main trade in VPH spectrograph design is between maximum work- 
ing angle and pupil size. Figure 4 shows the average maximum efficiency in unpo- 
larized light for a range of typical working angles. 

However, at specific higher angles whose exact value depends on the mean index 
of the diffracting medium, the polarizations realign in phase and overall efficiency is 
again very high.!?’ Gratings that work at these high angles are sometimes referred 
to as Dickson gratings, after their inventor Leroy Dickson.!?° Gratings of this type 
have been employed to give very high R at relatively modest pupil size in the 
APOGEE,!”? HERMES,!”? and CWI/KCWI?!2 18° spectrographs. 

The efficiency envelope of VPH gratings changes with incident angle, and this 
can be exploited to tune a single grating to perform at peak efficiency over a range of 
wavelengths. The envelope of peak performance over the range of working angle, a, 
has sometimes been called the “superblaze” of a VPH grating.3! Fully exploiting 
this feature of VPH gratings usually requires an articulated camera, so that the 
camera collimator angle can be changed. The tunable property of VPH gratings 
can become a liability in multi-object spectrographs, where some slits or apertures 
are not aligned with the field center. These off-axis slits can illuminate the grating at 
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Fig. 4. The drop in VPH unpolarized efficiency maxima at large working angle due to 
polarization-dependent losses. Surface reflection losses are not included. 
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significantly different angles from the field center, compromising the efficiency. This 
problem is most acute for low line-density gratings that function at small a, because 
the off-axis angle can be a significant fraction of a itself. Obviously, for all gratings 
the change in response with field angle makes flux calibration difficult, and this has 
driven multi-object VPH spectrograph design toward fiber-fed designs in which the 
fiber outputs can be arranged into a pseudo-slit at a single field position in the 
dispersion direction. However, when the slit reaches sufficient length in the spatial 
direction, similar effects can occur. In general, rigorous coupled wave analysis of 
grating performance for the entire range of contemplated working angles is required 
as part of the spectrograph design process. 

A final feature to be aware of in the design and use of VPH spectrographs in 
collimated beams near Littrow is the appearance of “Littrow ghosts”.!8* These can 
occur when the otherwise faint reflection of the spectrum off the detector passes 
back through the camera and is reassembled by the grating acting in reflection to 
make an image of the slit that appears at G6 = a. After the explication of this 
phenomenon by Ref. 132, it is now common to break this symmetry by recording 
the planes of equal index in VPH gratings at a slight tilt to the surface normal of 
the glass substrate. This does not eliminate the reassembled image, but if the tilt 
angle is chosen correctly it moves it off the detector. 

The introductory paper by Barden et al.'3! summarizes the achievable fabri- 
cation parameters for gratings made in dichromated gelatin. Gelatin is the most 
common material used for astronomical VPH gratings because of its high trans- 
parency from 300nm to 2700nm, and the high modulation in refractive index it 
is capable of achieving. Other materials have also been used, including polymer 
resin.!83 Typical gratings in gelatin range in thickness from 3-30 microns, and have 
index modulations of An ~ 0.01—-0.12 about a mean index of 1.4—1.45. 

To achieve high grating efficiency in first order, the grating must meet two 
design conditions and one operating condition. The operating condition is that the 
incident light must strike the grating at an angle that meets the first order Bragg 
condition for the desired wavelength and grating line density. The design conditions 
can be expressed using simple formulas from Kogelnik that impose two different 
criteria. The first criterion is that the grating be thick enough to operate in the 
Bragg regime or preferably in the “Kogelnik limit”, in which nearly all of the light 
is either dispersed into first order or left in Oth-order. Gratings thinner than this 
operate in the Raman—Nath regime and spread light into multiple higher orders. 
Gratings for astronomy are typically built to maximize efficiency in first-order and 
operate in the Bragg regime. The boundary between these two regimes is not sharp, 


but a rule of thumb employed by grating designers is!” 
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where D is the grating film thickness, ng is the mean film index, d is the grating 
period, and 4 is the wavelength of light. From this condition it can be seen that 
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the difficult design space for VPH gratings is long periods in the blue or ultraviolet. 
For example, to be efficient in first order at 350 nm, a 300 or even 600line/mm 
VPH grating would have to be impractially thick. The second design condition 
comes from Kogelnik’s formula for maximum efficiency in transverse-electric (TE) 
polarization when operating at the Bragg condition 


mtAn:D 
A cos(a2p) 


n = sin? | (5) 
where 7 is efficiency, Ang is the index modulation amplitude, and a2 g is the incident 
angle that meets the Bragg condition for the chosen X. This efficiency reaches its 
first maximum at 


An 2D r 
cos(a) ~ > (6) 
This may be thought of as a “resonance condition” in which the projected modula- 
tion depth on the left-hand side equals one half wavelength of light at the intended 
operating wavelength. 

During fabrication of a VPH grating with fringes normal to the substrate, 
there are three main manufacturing parameters to control in the manufacturing 
process: the grating period, which is established by the recording laser wavelength 
and its angle of mutual interference, the film thickness D, which is set by the film 
coating process but changes during development and drying, and the size of the 
index modulations, Ang, which is set by exposure energy and process parameters. 
Equation (6) shows that for reasonably small working angles where cos(a) ~ 1, 
the “modulation-thickness” product, AnzgD, should be about one half the intended 
working wavelength to achieve the highest efficiency. For example, for a grating 
intended to work at 500 nm in 5 micron thick film, the target Anz would be 0.05. 

However, as Barden et al. emphasized, Kogelnik also showed that the bandwidth 
of the efficiency envelope for a VPH grating scales as 1/D, so in practice, one 
always wants the thinnest grating possible that meets Eq. (6). Returning to the 
300 line/mm grating example at 350nm, lowering An and raising D to meet the 
condition expressed by Eq. (4) would result in a very narrow bandwidth grating 
that is undesirable for most astronomical applications. 

Users of VPH gratings should be aware of a number of manufacturing issues 
that prevent the gratings from operating as expected from theory. The modulations 
in refractive index that are typically modeled as sinusoidal and uniform throughout 
the depth of the film will actually change with depth owing to absorption of the laser 
light with depth. In addition, there may be parasitic recordings of various kinds, 
including those from light scattered off optics and holders in the recording appara- 
tus, which contribute to scatter in the final grating, and those from light reflected 
off the substrate itself, which generates a reflection grating with fringes parallel to 
the glass surface. Parasitic reflection gratings can reflect in a narrow band, result- 
ing in sharp dips in the response function that resemble spectral absorption lines. 
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VPH manufacturers use a variety of strategies, from simple to complex, to reduce 
the strength of parasitic holograms. For gratings with tilted fringes, the fringes 
can also curve during the development process, resulting in broader bandwidth but 
lowered peak efficiency. Likewise, variations in index modulation across the aper- 
ture of a grating, arising from non-uniformity in the recording beam intensity or in 
development baths, can compromise peak efficiency. It is advisable to test grating 
performance over the full aperture rather than at a single spot. Finally, the total 
wavefront error introduced by a grating will be no better than the quality of the 
wavefronts that produced the recording beams, and it can be a lot worse, so proper 
collimation and high surface quality recording optics are required. 

VPH gratings for astronomical instrumentation are an area of active develop- 
ment, with significant efforts focusing on making large size monolithic or mosaic 
gratings for ELT, GMT and TMT°® instruments. 9: 194-138 These will require very 
large recording optics and the associated developing and handling equipment for 
meter-scale substrates. The largest VPH gratings fabricated for astronomy thus far 
are for the APOGEE spectrograph, recorded as a three-segment mosaic by stepping 
and repeating exposures on a single monolithic substrate of 305 x 508 mm, and HER- 
MES, 565 x 240 mm in size with clear apertures up to 525 x 200 mm.!??:!?9 Another 
area of activity is the fabrication of large numbers of identical gratings for massively 
replicated spectrographs such as VIRUS.'°9 VPH gratings are often combined with 
prisms to control the deviation angle between incident and diffracted beams, as is 
required for retrofitting in an existing spectrograph, or for using gratings of different 
line density in a mechanically fixed camera-collimator angle spectrograph design. 
Owing to their high efficiencies and compact transmission format, VPH gratings 
have become the cross-disperser of choice for new echelle spectrographs.9” 124 140-144 
VPH gratings are also heavily used in instruments with low to moderate spectral 
resolution and MOS spectrographs.!4> 148 

One promising current area of research involves the development of spherically 
curved VPH transmission gratings, employed in single pass or double pass mode. 
Reference 149 has shown that these gratings can provide aberration corrections 


159 employing convex ruled 


in a manner analogous to Offner style spectrographs 
gratings. Reference 151 has shown that it is possible to produce highly efficient 
curved VPH gratings that can be combined with a single spherical mirror to pro- 
duce aberration-corrected spectral images. These “spherical transmission grating 
spectrographs” (STGS) will not easily accommodate off-axis light, and like Offner 
spectrographs, they do not naturally include significant focal reduction capability. 
However, they show promise in the area of fiber-fed spectrographs with very few 
optics and in the area of hyperspectral imaging. Designs are under active develop- 
ment for both applications. 
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Atmospheric dispersion is a serious problem particularly for high-resolution imag- 
ing and astrometry, and must be corrected before the focal plane. In spectroscopy, 
dispersion results in slit losses unless corrected before the slit or slitlets. This 
chapter describes the dispersion phenomenon, and design of atmospheric dis- 
persion correctors (ADCs) of various types, especially the Linear ADC and the 
crossed-Amici prisms ADC. A case study is included of an infrared ADC in a very 
high-resolution imaging system fed by adaptive optics. 


1. Introduction 


Refraction by the atmosphere is a well-known phenomenon, and differential refrac- 
tion or dispersion occurs because the refractive index of air changes with wavelength. 
Dispersion exists at all elevation angles except the zenith. At low elevations, it can be 
severe enough to cause easily seen effects such as the “green flash”, or bright stars at 
low elevation appearing to change color due to atmospheric scintillation. In practice, 
most astronomical observations are made at air masses less than 1.5—2.5 to avoid 
severe atmospheric absorption in the UV, and poor seeing at most wavelengths, so 
dispersion effects are less apparent but still significant. 

Even at moderate air mass, atmospheric dispersion is problematic, as shown in 
Fig. 1. At approximately 30° elevation, dispersion across the UV-Vis range, 0.3 < 
A < 1m, amounts to over 5 arcseconds at sea level and ~3.7 arcseconds at 4200-m 
altitude. This is many times the typical seeing disk, and results in poor imaging and 
severe slit losses for spectrographs at optical wavelengths. In the near IR, dispersion 
is generally negligible for ground-based seeing. However, with the advent of adaptive 
optics systems in the near IR, dispersion becomes problematic for AO-systems on 
large telescopes even across a single passband (Fig. 2). 
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Fig. 1. Atmospheric dispersion in the UV-Vis for Mauna Kea (4050-m) under nominal conditions. 
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Fig. 2. Atmospheric dispersion across each band in the near infrared. While small compared to 
ground-based seeing, dispersion across each band is significant compared to the diffraction-limited 
images achievable with adaptive optics on large telescopes. 
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Atmospheric dispersion, if uncorrected, can clearly degrade image quality and 
can severely impact astrometry, as the centroids of the image will shift depend- 
ing on the spectral energy distributions of the stars. For spectroscopy, the issue 
is severe, as parts of the image can fall off the spectrograph slit, or slitlets in the 
multi-object case; the missed light will depend on wavelength, so it not only affects 
the throughput but makes the throughput wavelength dependent. This can also 
happen with fibers in multi-object fiber spectrographs. With slits, the problem can 
be addressed by aligning the slit to the parallactic angle, assuming this is feasible 
for the science program. However, even in this case, gains in efficiency can be made 
for slitlets, since the slitlets will need to be made longer (and hence fewer objects 
can be observed) to allow all of the dispersed image to be sampled along the slitlet. 
If the images are dispersion-corrected, they will be smaller and thus more slitlets 
can be accommodated. 

An atmospheric dispersion corrector or compensator (ADC) is designed to 
reduce the effects of atmosphere-induced dispersion to acceptable levels. There are 
two basic types, each with advantages and disadvantages. These will be explored in 
later sections, but first we consider dispersion models. 


2. Dispersion Models 


Atmospheric refraction, and thus dispersion, is always toward the zenith, i.e., at the 
parallactic angle, with shorter wavelengths refracted more toward higher elevation. 
Atmospheric refraction at elevations of interest is well-described by the analytic 
expression 


R=2Z,-Z, = AtanZ, + Btan® Z,, (1) 


where Z, is the apparent zenith distance and Z; is the true zenith distance in the 
absence of atmosphere. The refraction constants A and B depend on the index 
refraction of the atmosphere. Under typical (sea level) conditions, refraction at 
optical wavelengths is about 1 arcminute at Z + 45°, and over 0.5° at the horizon. 
Note that there also exist more complicated models that do not use this analytic 
expression, rather doing numerical integrations, but the deviations from the analytic 
expression are typically negligible at Z < 75°. 

As noted, the refraction constants A and B depend on the refractive index for 
the atmosphere, which in turn is dependent on barometric pressure, temperature, 
relative humidity, altitude and latitude, with pressure and temperature dominating. 
The most commonly used refractive index model is probably that coded in the 
Starlink AST library,! which is based on the model in Green.? For alternate models, 
see the review in Ref. 3. 

Atmospheric dispersion is then calculated from the difference in refraction at a 
given wavelength compared to a reference wavelength (usually chosen as the center 
of the wavelength range of interest). 
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It is important that the resulting index variation depends most strongly on 
pressure and temperature; for precision dispersion corrections (needed with adaptive 
optics imagers on large telescopes), these quantities must be known to adequate 
precision, such as ~3 mB and ~1°C. 

The standard formula works well from the atmospheric cutoff (A ~ 310 nm) into 
the near IR (A ~ 2 um). Beyond this, anomalous dispersion near molecular bands — 
especially water bands — can become significant. These bands are strongly affected 
by the amount of water vapor in the atmosphere, and the anomalous dispersion 
dominates normal dispersion at even relatively low humidity. There is a model 
by Mathar*> that calculates the total dispersion at wavelengths into the mid-IR. 
This model is reported to be in good agreement with observation at wavelengths 
8.3<A<11.3pm.® 


3. Atmospheric Dispersion Correctors 


There are two ADC designs in common use: the linear or longitudinal ADC, which 
operates in converging beams, usually near the telescope focus; and the counter- 
rotating (Amici) prisms ADC, which operates near the pupil image in collimated 
light. 

All ADCs work by means of thin prisms, and it is worth keeping in mind that 
for thin prisms in vacuum or air, the refraction angle is approximately 


0 = (n(A) — 1)-a, (2) 


where n(A) is the refractive index as a function of wavelength, and a is the wedge 
angle of the prism. The effect of thin prisms is not very sensitive to the prism 
orientation with respect to incoming light (“tilt”), so mechanically they do not 
have to be aligned or held at high precision. 

A pair of identical prisms can be used to make what is effectively a variable- 
wedge prism. When the two prisms are aligned to maximize the refraction, it is 
essentially double the deflection and dispersion of each single prism. If the prisms 
are now counter-rotated, the effective wedge angle is reduced, but without changing 
the direction of refraction. When the prisms have been counter-rotated by 90° each, 
i.e., they are 180° out of phase, the combined refraction/reflection amount is nulled; 
at this point the combined prisms act as a plate of glass rather than a prism. 

Zero-deviation prisms (ZDPs) are compound prisms designed to give large dis- 
persion but no deflection at a specified wavelength. An example of this is the Amici 
prism (not to be confused with the “Amici roof prism”) which is a compound prism 
consisting of one material with large dispersion and the second of low dispersion. 
The first prism refracts the light strongly as well as dispersing it by a large amount; 
the second prism refracts the light by the same amount but in the opposite direction, 
and is chosen to have only small effect on the dispersion. The result is a dispersing 
element that does not bend the light at the specific zero-deviation wavelength, but 
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does disperse it. ZDPs are preferred whenever possible as their use can avoid many 
serious design problems caused by a deflected beam, which in essence is a deflected 
optical axis. Without ZDPs, the counter-rotating prism described above would pro- 
duce a variable deflection of the optical axis, complicating the mechanical design of 
an instrument employing it; with ZDPs, the optical axis is essentially stationary. 

A simple ADC might be a thin prism placed at a variable distance immediately 
before the focal surface of the telescope. The light entering the prism is dispersed 
by the atmosphere. The amount of dispersion correction by the prism is set by the 
distance between the prism and the focal surface: as the prism moves forward from 
the focal plane, the dispersed monochromatic images at the focal plane are displaced 
by different amounts, and if the dispersion curve of the glass is a decent match to the 
atmospheric dispersion curve, there will be a distance where the images at different 
wavelengths align at the same location at the focal surface. In practice, however, 
this simple ADC has serious flaws, notably: 


e angular change of the optical axis by the refraction angle; 

e severe packaging issues (right in front of the focal surface when nulled); 

e variable aberrations across the field, and a tilt in the focal surface caused by light 
from different field points going through different amounts of glass; 

e inability to fully null. 


In short, this is not a practical system. The simple prism above could be replaced 
by counter-rotating prisms, but similar optical issues arise, especially because of 
different (and now variable) path-lengths through the glass, so that system is also 
impractical. 

We now turn our attention to practical designs. 


3.1. Linear ADC 


The longitudinal or linear ADC (LADC) design’ was introduced in 1996. The LADC 
works in image space, in a converging beam, and where displacement of rays results 
in displacement of images. The concept is simple: it consists of two matched thin 
prisms oriented 180° with respect to each other. The first prism refracts the light, 
with different wavelengths being refracted slightly differently, and the second prism 
completely cancels the refraction of the first prism. Meanwhile the images at differ- 
ent wavelengths have been displaced by differing amounts, controlled by the prism 
separation, and thus can be recombined, that is, re-stacked on top of each other, 
at the focus. The concept is shown in Fig. 3. When there is no prism separation, 
the system is nulled and the prism pair acts like a thin parallel plate. For a given 
optical material, the maximum correction is a product of the maximum separation, 
or “stroke”, and the prism wedge angles. Because the LADC works in converging 
light, it can be placed anywhere in front of the focal plane, but for practical reasons 
it is placed immediately before the focal surface. 
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Fig. 3. Schematic view of a LADC. The two prisms are identical with the only operational 
parameter being the separation, d. In the top panel, there is no prism separation and the system is 
nulled (no correction). As the prisms separate, the correction increases (bottom panel, where the 
dashed rays are bluer than the solid rays). The angle of refraction of the first prism is wavelength 
dependent, and the second prism exactly cancels the refraction of the first prism. However, the 
optical axis of the telescope is effectively displaced by an amount # = d-(n—1)-a, resulting in a 
shift of the focal surface. 


The LADC is typically used for correcting across large areas such as telescope 
focal planes, and is particularly useful for feeding multi-object slit or multi-fiber 
spectrographs where we want to place all the light into a slit or fiber end whose 
size is comparable to the seeing disk. Therefore, these systems tend to use thin 
prisms slightly larger than the telescope FOV. For example, the Keck-I Cassegrain 
ADC has prisms slightly over 1-m clear aperture. Only one UV-transparent optical 
material, fused silica, can be manufactured in the required blank size for such large 
prisms. Fortunately, the dispersion curve of fused silica is a decent match to the 
atmospheric dispersion. Residual dispersion is typically ~0.1/’ which is acceptable 
for most seeing-limited observations. 

Since the prisms are located very near the focus, wavefront errors can be rela- 
tively large without severely impacting image quality; thus, requirements on index 
inhomogeneity and flatness are easy met. As noted above, the thin prism tilt accu- 
racy is not a severe requirement, nor is clocking of the prisms — thus, the mechanical 
tolerances for holding these large prisms are fairly easy to meet. 
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In practice, LADCs have one major optical issue that is relatively easy to handle. 
The primary effect of the first prism is a gross displacement of the image, while 
the dispersion displacement is a secondary effect. The second prism returns all the 
deflected rays to their initial angles, but they are displaced by an amount dependent 
on prism separation, effectively displacing the optical axis. (If the prisms could be 
ZDPs, this would not be a problem, but in general 1-m class ZDPs are currently not 
possible.) In fast modern telescopes, the focal surface is usually steeply curved, so 
for an instrument with fixed position, spatially-dependent changes in focus and non- 
telecentric rays result. If the focal surface is parabolic, the displacement causes a 
tilt of the focal surface relative to the original surface. The way around this problem 
is to displace the instrument by the same amount that the optical axis is displaced. 
While translating large instruments is not a trivial mechanical problem, doing so 
completely addresses this problem. Alternatively, if there are additional fold mirrors 
(for example, at Nasmyth focus), the optical axis can be translated back into place. 
The current design for the Wide Field Optical Spectrograph (WFOS) on TMT calls 
for a fold mirror so the instrument can be mounted vertically (to eliminate flexure 
from changing gravity vectors), and this fold mirror can be tilted in combination 
with the flat telescope tertiary to make this displacement optically rather than by 
mechanically translating the entire instrument. 

The largest optical aberration from the LADC is axial chromatic aberration 
due to the introduction of the glass of the thin prisms. Since every ray bundle sees 
the same total glass thickness, it is uniform across the field of view. This leads to 
a certain degree of defocus at most wavelengths, which will affect image quality; 
however, in a slow beam this is usually not a significant problem. Similarly, the 
focus will see a gross retardation by the amount 


A =T x (1-1/n(A)), 


where A is the change in focus and T is the total prism thickness. The chromatic 
aberration is inherent in this equation. 

The next largest aberration is constant astigmatism. Placing a tilted plate in 
the beam will produce such astigmatism, so at null position this can be reduced 
to zero if the outer prism faces are perpendicular to the beam and the prism spac- 
ing is negligible. However, as the prisms separate, the astigmatism increases. This 
astigmatism can be controlled by slightly tilting the prisms, with the amount of tilt 
dependent on the prism separation. 

The final aberration of note is constant coma, and this may be the largest 
significant aberration if the constant astigmatism is controlled. 

Large telescopes are now often constructed with segmented or thin primary 
mirrors under active control. In these telescopes, the constant astigmatism and 
constant coma of the LADC may be controlled by introducing aberrations of the 
opposite sign in the primary. If the controlling wavefront sensor sits behind the 
ADC, these aberrations will be removed automatically. Of course, it is important to 
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verify the stroke of the primary mirror actuators is large enough to accommodate 
the added aberrations. 

The prisms need to be oriented for the parallactic angle, which is always parallel 
to the elevation vector. In modern alt-azimuth telescopes at Cassegrain focus, the 
parallactic angle is fixed, so the only required motion is the linear separation of 
the prisms, making such ADCs extremely simple from a mechanical perspective. At 
Nasmyth focus, however, the parallactic angle follows the minor axis of the tertiary 
and so rotates with elevation. Therefore, the entire ADC unit must rotate with 
elevation to track the parallactic angle. This may require a separate rotation stage, 
or the ADC could be fixed to the telescope so it rotates as the telescope changes 
elevation. 

Note that in the ideal LADC system, the first prism should remain on the 
optical axis, while the second should be displaced to follow the displaced optical 
axis. Neglecting to do this has no optical effect, but it allows for the minimum prism 
size and is therefore worthwhile. 


3.1.1. Example 


A good example of a LADC is the Keck-I Cassegrain ADC.® This is designed to 
correct dispersion at optical wavelengths at the Cassegrain focus across the entire 
10 arcminute radius of the unvignetted field of view of the Keck Telescope. Being at 
Cassegrain focus, no rotation is required (the zenith is always up); the only motion 
is a linear stage that separates the two fused silica prisms symmetrically to maintain 
balance. The ADC prisms have a clear aperture of 1.03-m and a wedge angle of 2.5°. 
The blanks were produced by Corning of grade 4F fused silica, and were polished at 
Zygo to a flatness of 1-wave at 632nm. The maximum stroke of the prisms is 1.7m, 
suitable for full correction to Z = 60° across 0.31 < A < 1.1 wm. The peak-to-valley 
residual dispersion is +0.11 arcseconds. 

Ideally, the low-resolution imaging spectrograph (LRIS) at the telescope focus 
should translate to follow the displaced optical axis, but since this ADC was added 
later, LRIS rotates on the original optical axis, leading to defocused images due 
to the effective tilting of the focal surface. As the instrument rotates to maintain 
constant position angle on the sky, the telescope focus must be adjusted for the 
off-axis LRIS field, but defocus across the LRIS field itself cannot be corrected. 


3.2. Counter-rotating prism ADC 


In collimated light, a pair of counter-rotating prisms can be used for an ADC that 
has minimal impact on image quality. Simple prisms could perform the dispersion 
correction task, but the optical path would then be deflected by amounts varying 
with the correction, so in practice a pair of zero-deviation Amici prisms is used. 
At the reference wavelength, the optical axis is unchanged. We refer to this ADC 
type as a “Crossed-Amici ADC” (KAADC), although it is often [inaccurately] called 
“crossed Risley prisms.” The design is shown schematically in Fig. 4. 
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Fig. 4. The XAADC layout, shown at maximum correction, with up toward the zenith. To lessen 
the correction, the prisms are counter-rotated; at 0 = 90, the prisms cancel each other to null the 
system. 


This type of ADC can be used as a variable-wedge thin prism in simple imaging 
systems, such as at prime focus. It is compact and therefore fits well into the lim- 
ited space before the prime focus, often as part of a prime-focus corrector system. 
An example is the Prime Focus Camera? on the Lick Observatory 3-m telescope. 
However, finding matched pairs of glasses in large sizes tends to limit the overall size 
and usefulness of these systems. This type of ADC is also finding use in narrow field 
spectrometers, such as the New Extreme Precision Doppler spectrometer! on the 
WIYN telescope and the MAROON-X spectrograph!! on the Gemini-N telescope; 
in these cases, the narrow field means the small prisms can be made of specialized 
glasses matched for precision dispersion correction. 

The XAADC is most usefully located in collimated light, such as found in 
re-imaging systems or AO systems where a pupil image is created. Recall that in 
collimated light, spatial positions are defined by angles. Since the rays from any 
spatial point are parallel, and the prisms consist of plane surfaces, all the rays from 
a given spatial point will be refracted at the exact same angle, preserving the parallel 
beam. As a result, this system is virtually aberration free. 

Using Amici prisms comprising two optical materials rather than one is obvi- 
ously more costly and complicated, but it means the compound prisms can be 
designed for a better match to the atmospheric dispersion curve, in addition to 
the advantage of zero-deviation prisms. In practice, this may mean the best-match 
optical materials are expensive and difficult to fabricate in large sizes. For this rea- 
son, among others, this type of ADC is usually located very near the pupil image 
to minimize required prism diameter. Since the XAADC is located near the pupil, 
wavefront errors from the prisms will have maximum impact, so index homogeneity 
and surface flatness have strict requirements. 
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Mechanically, the ADC consist of two cells each holding an Amici prism, that 
counter-rotate +£90° from maximum correction to the nulled position. Since the 


ADC is usually located inside an instrument that is rotating to preserve position 
angle on the sky, the direction of the parallactic angle will change, so the ADC as 
a whole must also rotate to any angle. In practice, this means that independent 
rotation of each Amici prism may be the preferred solution. 


3.3. ADCs for precision astrometry 


In order to take full advantage of imaging with adaptive optics (AO) on large tele- 
scopes, particularly if the goal is precision astrometry, atmospheric dispersion must 
be corrected to an unprecedented degree. We require the ADC to correct diffraction- 
limited images without significant image degradation, and residual dispersion to be 
on the order of milli-arcseconds (mas). These requirements immediately limit us 
to the XAADC design, with the system housed in a cooled environment to control 
emissivity in the thermal infrared. The ADC must be in collimated light, so it could 
either be placed in a re-imaging instrument or in the AO system path. In addi- 
tion, many glasses are eliminated from the candidate materials because of poor IR 
transmission. 


3.3.1. Case study: ADC in the Infrared Imaging Spectrograph (IRIS) 


An example of this type is the ADC?” in the Infrared Imaging Spectrograph (IRIS)!* 
under development for the Thirty-Meter Telescope, and we will use this case to dis- 
cuss various design aspects for the precision astrometry case. IRIS will be fed by 
the facility adaptive optics system. The requirements for this ADC are residual dis- 
persion across each passband <1 mas, to permit precision astrometry approaching 
10 micro-arcseconds (jas). 

We initially studied whether the ADC could go into the facility AO system, 
but found it was impossible to obtain blanks of the required prism materials large 
enough for the beam size. This required the ADC to go into the cryogenically-cooled 
IRIS instrument. The pupil size in this system is ~90mm, and even this relatively 
small size was difficult to match. In the end, we identified six pairs of suitable 
glasses, only four of which could be obtained in the required sizes. 

At the level of precision required, distortion becomes a concern. The effects of 
a fixed distortion pattern can be modeled and removed, but a changing distortion 
pattern can lead to image blur on long exposures. Therefore, we spent consider- 
able time looking at the distortion introduced by the ADC. What we found was a 
complex distortion pattern that, fortunately, can be decomposed into two simple 
parts, illustrated in Fig 5. We call these two terms “elevation distortion” (since it 
is always in the elevation direction, ie., aligned with the dispersion direction) and 
“linear distortion” (since it varies linearly across the field). The properties of these 
two components are summarized in Table 1. 
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Fig. 5. Elevation distortion (a) and linear distortion (b) for the IRIS ADC from Phillips et al.!? 
Both distortion patterns are for the maximum amplitude, occurring at prism angle 0° for the 
elevation distortion and at prism angle 45° for linear distortion. The square represents the 32 x 32 
arcsecond field at the detector. See text for discussion. 


Table 1. Characteristics of distortion terms with prism rotation and distance from 


field center, from Phillips et al.!? 

Distortion type Scaling wrt 6 Scaling in field Amplitude affected by 
Elevation (y;z = 0) cos(9) r? Glass 
Linear-x cos(@) x sin(@) y Glass/tilts 
Linear-y —cos(@) x sin(@) x Glass/tilts 
Pupil displacement sin(@) n/a Glass/tilts 


Elevation distortion is maximum when the prisms are aligned for full correction 
and is non-existent at the nulled position. It varies in amplitude with prism angle 
as cos 9, and increases away from the center of the field as r?. It appears to depend 
primarily on the glasses in the prisms, and is minimized when the wedge angles of the 
glasses in each Amici prism are similar, that is, when the average refractive indices 
match well. Linear distortion is maximum when the prisms are 90° crossed (i.e., 
6 = 45°), and nonexistent at nulled and maximum correction, and it increases away 
from the center of the field linearly with r. Note that the direction of displacement 
depends on the specific location in the field. The amplitude of linear distortion 
depends on glass type and specific tilt angles of the prisms with respect to the 
optical axis. The typical maximum distortion here is of order 10 mas, but after the 
modeled distortion is removed, residual distortion is on the order of 10 jas (rms) 
across the field. 

In addition to distortion, there can be effects on image quality caused by a 
displacement of the pupil image by the ADC. This displacement is non-existent 
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Maximum Correction (3 = 0) Nulled (3 = 90) 


(a) (b) 


Fig. 6. Schematic diagram of the XAADC showing displacement of the pupil image, maximum 
at the nulled position (b). Note that (a) is a side view, whereas (b) is a top view. See text. 


Table 2. Detailed properties for six candidate glass pairs for the IRIS ADC, from Phillips et al.!? 


S-LAH79 S-LAH71 S-TIM39 §S-TIH11 S-PBH56 S-FTM16 
S-FPM3 S-FPL51 S-BSM2 S-PHM52 S-BSM2 S-BAL42 


Elevation amplitude (Ai) 1.056 1.010 0.256 0.379 0.567 0.020 


Linear amplitude (A2) 0.60 0.77 1:2 0.67 0.66 1.37 
Pupil displacement (mm) 0.88 1.02 1.61 0.89 0.88 1.87 
rms Resid. dispersion (mas) 0.84 2.51 0.82 2.11 1.14 1.21 
Available at diam ~140 mm? No Yes Yes Yes No Yes 

No Yes Yes Yes Yes Yes 
Tot transmiss. @2.0 wm 0.95 0.97 0.89 0.86 0.94 0.90 
Tot transmiss. @2.2 wm 0.89 0.95 0.75 0.75 0.83 0.72 


Tot transmiss. @2.4 wm 0.72 0.88 0.59 0.65 0.68 0.61 


at maximum correction and greatest at nulled position. Why this occurs is easily 
demonstrated in Fig. 6 by considering how the rays are displaced. At maximum 
dispersion correction, symmetry dictates that any ray displacement introduced by 
the first prism will be removed by the second prism. In contrast, at nulled position 
any displacement by the first prism will be doubled by the second. Another way to 
look at this is that the prisms of each glass type combine to make a tilted plate, 
and the two tilted plates will introduce a displacement. 

It is instructive to look at the candidate glass pairs for the IRIS ADC, shown 
in Table 2. Note that the residual dispersion is across the entire wavelength range 
0.84 < A < 2.45 um; the residual dispersion within each passband is significantly 
smaller, <1 mas rms. We have calculated the relative level of the two distortion 
components, as well as the pupil shift for each glass pair. Two glass pairs were 
eliminated because one or both glasses could not be obtained in the required sizes to 
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produce a high-quality clear aperture of ~90 mm. The final down-select to S-FPL51 
and S-LAH71 was based on the rather poor combined transmission at A > 2 um of 
the other three pairs. 


3.4. Compensating lateral ADC 


There is one type of ADC that is practical for some applications, particularly imag- 
ing at the telescope focal surface: the compensating lateral ADC (CLADC).'4 This 
clever design uses the fact that two curved surfaces that are displaced laterally 
create, to first order, a wedge, whose variable opening angle is set by the amount 
of displacement and the curvature of the lenses. By using the lenses in a wide-field 
corrector in front of the focal surface and simply displacing one or more lenses lat- 
erally, this provides atmospheric dispersion correction free of charge: no additional 
optics, and also no additional surfaces for throughput losses. Of course, intention- 
ally de-centering the lenses will introduce some image degradation at the same time. 
However, successful designs of this type of ADC have been made that show good 
correction with acceptable loss of image quality. There will be spherical aberration 
variations across the focus, and possibly a slight tilt in the focal surface, but these 
will be minimal because the wedge (“prism”) has little thickness, since it lacks the 
glass required for structural support as in free-standing thin prisms. Aberrations 
can also be minimized by tilting the corrector elements.!° The details are dependent 
on the actual materials and surfaces of the lens or lenses in question. 

Another clever advantage with the CLADC is that the lens displacements may 
be gravity-driven for a passive ADC.1° 


4. Other Considerations 


A major concern is the ADC’s effect on the throughput of an instrument. Each 
optical surface in the ADC is a source of reflection losses. The simple prisms in a 
typical LADC add four surfaces to the light path. In LADC with zero-deviation 
prisms, or in the XAADX, this becomes eight additional surfaces. At 1% loss per 
surface, this reduces the throughput by 4% or 8%. At optical wavelengths, the 
elements in the Amici prisms can be coupled with fluid or grease to avoid reflection 
losses from the four additional surfaces, but in the cryogenic IR, this is not an 
option. The only relief to these reflection losses is excellent anti-reflection (AR) 
coatings, which typically must operate over a broad wavelength range. 

The author has had good success using broadband AR coatings based on silica 
sol-gel deposited over a MgF, film to broaden the passband.’ The MgF, and a 
bond material (usually AlgO3) are first vacuum deposited, and then the sol-gel is 
applied with spin coating or dip coating, or (ideally) meniscus coating. The sol-gel 
should be hardened to produce a sufficiently robust, durable surface. Treating with 
a hydrophobic material like hexamethyldisilazane (HMDS) is recommended. 
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Additionally, internal transmission losses for some glasses can have important 
ramifications on throughput, particularly in the infrared (see Table 2 for examples 
of this). 

Another design consideration concerns ghost images created by the plane sur- 
faces of the ADC prisms. This is particularly so for the parallel surfaces in the LADC 
prisms, especially when the prism separation is small. As the prisms separate, the 
converging beam of the ghost travels a larger distance and the ghosts quickly become 
de-focused, meaning lower intensity but larger area. In the XAADC, light reflected 
from parallel surfaces will be focused into an image, possibly on the actual image or 
(more likely) slightly displaced, depending on how well aligned the prisms are. The 
situation where the ghost images are most problematic will be when the prisms are 
nulled, as that configuration gives parallel surfaces for all surfaces in the prisms. 
Ghost images in the XAADC will always be in focus, but will move as the ADC 
correction changes, and so can be identified by image blurring or by movement 
across multiple exposures. 

There is also the issue of guiding during observations to consider. If the guider is 
located behind the ADC, it will automatically track any image motion introduced 
by the ADC. If not, especially for precision astrometry, care must be exercised 
to make all the necessary offsets, as well as to provide the same ADC correction 
to the guider(s) so that the guider images are not sensitive to the spectral energy 
distribution of the guide stars in a way that is different from the science instrument. 
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Over the last few decades, the use of photonic fibers and waveguides in astronom- 
ical instrumentation has grown substantially. This paper provides a summary of 
the driving motivations for this growth. We also provide a description of the key 
devices and concepts that include photonic lanterns, fiber Bragg gratings (FBGs), 
waveguide Bragg gratings, hexabundles, mode scrambling fibers, integrated beam 
combiners and nullers, ring resonators, and integrated spectrographs. In the years 
ahead, as emerging photonic fields grow in maturity, we expect to see a regu- 
lar sequence of such new techniques transitioning through lab demonstration, 
on-telescope demonstration, and onto new science results. 


1. Introduction 


The importance of optical fibers in astronomy grew from the realization in the 
1980s that multi-mode (MM) fibers provided a means of efficiently transporting 
light from the telescope focal plane to the slit of a spectrograph that could be placed 
off-telescope.! The spectrograph could thus be larger, more complicated, or more 
tightly controlled (due to lack of flexure or varying environmental conditions), and 
positioning systems could allow for many objects to be observed simultaneously with 
efficient use of the detector array area.” Later (e.g., Ref. 3), fibers were demonstrated 
to allow spatially-resolved spectroscopy in the form of integral-field-units. Single 
mode fibers also provided for light transport in optical interferometry.* 

The widespread use of fibers in astronomy catalyzed the emergence of 
“astrophotonics”, the implementation of photonic technologies in astronomical 
instrumentation incorporating functions beyond simply light transport through 
guiding.” Over the last decade there are have been many examples of astropho- 
tonic devices demonstrated in the laboratory and on-sky: photonic lanterns for 
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efficient beam conversion,® broadband multi-notch fiber Bragg gratings (FBGs) for 
suppression of atmospheric OH,” integrated arrayed-waveguide gratings for spec- 
troscopy,® integrated optical beam combiners in optical interferometry,® integrated 
pupil remappers for aperture mask interferometry,!° and laser frequency combs for 
high precision spectroscopy.!! !? 

In the following sections, we explain: the motivations for photonic fibers and 
waveguides in astronomy, which are used as filters, to enhance modularity, and 
increase observation precision; the theory behind the photonic lantern, a key com- 
ponent for translating between single-mode (SM) and multi-mode (MM) states; the 
operation and fabrication of FBGs, for filtering unwanted atmospheric emission; 
the use of photonic components for spectrograph calibration and mode control; and 
the theory and application of integrated waveguides. 


2. Motivations 


2.1. Coupling 


A ground-based telescope produces an image of a point source in its focal plane 
that has an angular width that scales inversely with the telescope diameter, D, 
and/or the strength of the atmospheric turbulence, which is characterized by Fried’s 
parameter, r0.° When injecting light from the telescope into an optical fiber, the 
number of fiber modes that are supported is either 1 (in the diffraction limited 
regime) or scales with the square of the ratio D/r0. This holds independent of any 
focal ratio adjusting optics that may be placed in the focal plane to modify the 
physical scale of the point spread function. For large telescopes or poor seeing sites, 
the number of modes can be very large, e.g., for D = 8m with a seeing of 1”, there 
are 4000 modes supported. Even for small apertures, long wavelengths, or operation 
with adaptive optics systems, it is rare to achieve the diffraction limit; in practice, 
more commonly under these conditions there are a least a few modes to consider. 
These factors have provided the motivation for the development of a mode 
exchange device that allows conversion between a single MM input to multiple 
single-mode outputs. In such a device, called a photonic lantern,® each of the output 
SMs can be fed directly into SM equipment (whether fiber filters or spectrographs). 


2.2. Filtering 


The atmospheric background in the near-infrared from 1 to 2 zm is dominated by a 
forest of narrow emission lines arising from molecular hydroxyl (OH) in the meso- 
sphere. The number, strength, and variability of these lines results in a significant 
limitation to deep near-infrared astronomical observations. Various approaches to 
suppress this OH emission have been proposed and tested, such as high-dispersion 
masking using Rugate filters or volume phase holographic filters. These approaches 


*See Chapter 13. 
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suffer from some drawbacks, however." The magnitude of this issue has provided 
motivation for the development of a photonic solution via FBGs and/or ring res- 
onators. 

The requirements for an OH suppression filtering system are quite demanding. 
There are many thousands of individual OH lines of various strength within the J 
and H bands. Each filter must match the wavelength of the brightest ~200 lines 
(which are actually doublets), with a suppression depth matched to the line strength 
(of up to 40 dB), and a linewidth of only ~1 A. 


2.3. Scaling laws 


The importance of sample size in stellar and extra-galactic science has long been 
recognized. This provides motivation for high multiplex capability in, generally, 
moderate resolution spectrographs, with a view to conducting large survey programs 
that target millions of objects. 

The overall size of an astronomical spectrograph for a fixed resolution and wave- 
length range scales with the physical width of the slit, which is usually matched to 
the width of the delivered image quality at the telescope focal plane. For seeing- 
limited observations, the slit width and hence the spectrograph scales with the 
telescope diameter and is significantly larger than necessitated via diffraction alone. 
Recently, the realization of the significant reduction in cost and complexity for spec- 
trographs that operate at the diffraction limit has provided motivation for photonic 
solutions to feed such spectrographs, as well as fully integrated photonic spectro- 
graphs themselves. In this way, it has been proposed that the scaling law can be 
overcome for large telescopes by modularity and mass-replication.‘? > Alterna- 
tively, for small telescopes spectrographs can become very inexpensive. 


2.4. Precision 


There is a growing scientific motivation (e.g., for exoplanet and stellar abun- 
dance work) for increasing the precision and calibration accuracy of radial velocity 
measurements using high spectral resolution (R > 50,000) spectrographs. There are 
several photonic approaches of relevance. 

The SM fiber-fed spectrograph!® has a number of advantages over traditional 
MM or wide slit spectrographs in terms of precision. First, as the spectrograph 
operates at the diffraction limit, it can be made very compact; thus, it is much 
easier to regulate the temperature and/or pressure than for much larger instruments. 
Additionally, it can be made more cheaply. These advantages are even greater in 
the case of a fully integrated spectrograph, such as an arrayed-waveguide grating.® 
Second, because of the SM transmission, the field profile that propagates through 
the spectrograph is decoupled from the input field profile, which may be affected by 
the telescope guiding system or the atmospheric seeing.!” This leads to a reduction 
in the intrinsic variability of the output spectra. Finally, modal noise!® for the SM 
spectrograph is easier to control. 
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For accurate wavelength calibration of both SM and MM high-resolution spec- 
trographs, new photonic approaches using laser frequency combs! ?° 
mode fiber-based Fabry—Perot etalons?! 2? 


approaches. 


or single- 
offer significant gains over traditional 


Finally, recent developments in fiber manufacturing techniques offer benefits for 


mode scrambling using novel fiber geometries and constructions.?*: 74 


2.5. Interferometry 


Interferometry offers the potential for high spatial resolution observations on mil- 
liarcsecond scales, which is particularly relevant for the characterization of exo- 
planets and for protoplanetary science. Various techniques rely on the combination 
of beams from multiple apertures, either in a multi-telescope interferometer or a 
multi-aperture (masking) interferometer. Significant technical efforts are needed to 
allow high efficiency interferometric observations at multiple wavelengths. 

Photonic components such as integrated beam combiners, integrated dispersers, 
and pupil re-formatters offer a variety of advantages.?° The reduced number of opti- 
cal components in such systems leads to improvements in system throughput, ease 
of alignment and assembly, and reduction in system complexity. Modal filtering 
properties of the integrated devices leads to improvements in fringe contrast and 
closure phase stability. Miniaturization of the system elements leads to improve- 
ments in system stability. Finally, these photonic components allow better control 
of path length between baselines, improving system coherence. Additionally, pho- 
tonic components offer advantages for applying techniques such as nulling, where in 
exoplanet systems the bright star light is destructively cancelled, leaving the planet 
light alone.?6 


3. Photonic Lanterns and Hexabundles 


A photonic lantern is a multi-mode photonic device that transforms incoherent, or 
partially coherent, light received at the input of a MM fiber/waveguide to a set 
of SM fibers/waveguides (and vice versa) with low loss. Thus, a photonic lantern 
could then be considered as a passive photonics adaptive optics system (with certain 
constraints, of course). A hexabundle is an imaging bundle that maintains multi- 
moded operation at both input and output ends but allows conversion from a 2D 
to a 1D fiber array. 


3.1. Photonic lantern concept 


Light propagation through a photonic lantern (see Fig. 1) closely resembles that 
through several fibers tapered together, even for those lanterns that are not in 
that literal form. Light in a given mode of the MM waveguide excites light in the 
separate SM waveguides with a given distribution of amplitudes and phases, as 
determined by a supermode in that part of the transition where the SM waveguides 
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Fig. 1. Mode conversion in a 7x 1 photonic lantern. (a) shows the evolution of modes (the effective 
index of each) throughout the tapered transition of the lantern. Picture insets give theoretical and 
physical examples of the modes, with the mode field patterns shown in (b). 


are far enough apart to be distinct, yet close enough to be weakly coupled. In the 
other direction, light in a given SM core excites light in several modes of the MM 
waveguide, as determined by the superposition of supermodes that sum to give light 
in only that one SM waveguide. We can say that the array of SM waveguides is just 
another MM waveguide, but its modes (i.e., the supermodes) are all degenerate and 
so freely couple together. A photonic lantern is then simply a device to interface 
this MM waveguide with a more conventional one. 

Figure 1 shows the mode conversion for a 7 x 1 photonic lantern. At large lantern 
diameters the modes of the SM cores are strongly confined, and thus do not couple 
and remain near-degenerate. In addition, there is a near-continuum of cladding 
modes. As the lantern diameter decreases, the confinement of the modes decreases, 
such that interaction between the cores increases. The resulting coupling leads the 
formation of non-degenerate supermodes that form from linear superposition of 
these original 14 modes (7 SM cores x 2 polarizations). These supermodes continue 
to separate and eventually become the modes of the MM core at the end of the 
transition. The number of SM cores determines the number of modes of interest, 
which must be conserved across the transition in a low-loss functioning lantern. 
However, the above conditions are necessary rather than sufficient, and losses can 
still occur. The total length of the transition must also be taken into account when 
designing a lantern. In every device in which the cross-sectional geometry changes 
along the propagation length, the change in the mode field of a propagating mode 
must be gradual for low-loss transmission. 
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3.2. Photonic lantern types and fabrication 


To make a photonic lantern we need a process for merging several SM cores into 
one MM core, or (equivalently) splitting one MM core into several SM cores. The 
only way that these mode converters can be made is by making a physical transition 
in which the SM waveguides either stop acting as such, and/or cease behaving as 
independent uncoupled waveguides. The final aim of this physical transition is to 
adiabatically form a MM waveguide in which the SM waveguides either vanish or 
form a composite waveguide formed by strong coupling between them. To date this 
has usually been achieved by two different techniques: by post-processing several SM 
fibers and/or multi-core fibers with SM cores to form a MM fiber core, with some 
kind of low-index jacket acting as the MM waveguide cladding; or by direct ultrafast 
laser inscription of a pattern of waveguides into an integrated-optic component 
(see Fig. 2). 


3.3. Photonic lantern performance and applications in astronomy 


Photonic lantern technology has proven to be versatile. For example, while the origi- 
nal photonic lantern design was always assumed to involve identical SM waveguides 
in the bridge region, this need not be true. Asymmetric lanterns, also known as 
mode-selective lanterns, have been developed, fueled by new optical transmission 
architectures in the telecom industry.?8 Contrary to bulk optics approaches for 
dividing and transforming light, photonic lanterns use modal properties and waveg- 
uide optics principles to work. Thus, many different optical transitions can be 
achieved by the photonic lantern technology as long as the entropy of the system, 
i.e., the number of modes, is conserved. They can be used as “light dividers” to 
allow for broadband low-loss interfaces between MM, few-mode and SM systems, 
thus allowing a waveguide transition from one to other as required. 

MM-to-MM photonic lanterns can be feasible very low-loss optical all-fiber 
slicers and pseudo-slit reformatters;?9 3° however, further work must be undertaken 
with astronomical spectrographs to fully exploit their potential and to understand 
important optical properties such as wavelength dependence. MM-to-MM photonic 
lanterns not only provide a practical way to use large-core fibers and hexabun- 
dles in large focal plane telescopes, but also provide a way forward for highly 
multi-moded photonic technologies in next-generation astronomical instrumenta- 
tion. Future work, and current areas of research, on photonic lanterns as light 
dividers in astronomy will include: low modal noise, focal ratio degradation free 
input/output optical fiber systems, and all-fiber numerical aperture converters that 
conserve étendue. 

Photonic lanterns to date have been demonstrated in many different sizes, types 
and wavelength ranges (from 500 nm to 1800 nm). These vary from smaller numbers 
of modes (3-15 modes) in telecommunications”® to the largest for astronomical 
applications with 511 modes,*! intended for fiber scrambling containing a mix of 
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(b) Low invchenx jackets fiber 
MM fiber 


Fiber bundle 
(c) Bulk glass cladding 


Waveguide array MM waveguide 


7x1 fiber fused PL 


Integrated photonic lantern 


Fig. 2. Schematic diagrams of the three different photonic lantern fabrication approaches. 
(a) Multicore fiber; (b) fiber bundle; and (c) ultrafast laser inscription (integrated photonic 
lantern). (d) Optical photograph of a 7 x 1 fiber fused photonic lantern and (bottom left panel) 
an integrated photonic lantern. Used with permission from Ref. 27. 


SM and few-mode cores. Photonic lanterns have also proven to have broadband 
performance of several hundred nanometers and transmissions of > 85% (with some 
sources reporting better than 90%).?9 31 


3.4. Hexabundles 


A hexabundle is similar in construction to a photonic lantern, but has a different 
function. The hexabundle consists of a series of MM fibers that are fused over a 
short distance into a capillary tube at one end and left as separate fibers at the 
other end. For astronomical applications, the fused end is located at the telescope 
focal plane (or a re-imaged focal plane), and the output fibers are arranged along 
the slit of a spectrograph. The hexabundle thus functions as an integral field unit 
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Fig. 3. A 61-core lightly fused hexabundle as used in the SAMI instrument. Used with permission 
from Ref. 33. 


for spatially-resolved spectroscopy. Key technical trades for the hexabundle are in 
the degree of fusing of the MM cores and the size of the cladding at the fused ends; 
these factors affect cross-talk, fill factor, and focal ratio degradation. 

Key advantages of hexabundles over alternative techniques that use microlens 
arrays or image slicers are that the hexabundle is compact and easily replicable, and 
in principal requires no additional optics. The hexabundle can thus be utilized in 
multi-object positioning systems. Thus far, hexabundles have been successfully used 
in the SAMI instrument at the AAT*? to collect spatially-resolved spectroscopic 
data from many thousands of galaxies, leading to enhanced understanding in many 
areas of galaxy formation and evolution. SAMI uses 13 hexabundles, each with 
61 fibers (Fig. 3), to feed the AAOmega spectrograph. 


4. Fiber Bragg Gratings 


4.1. Fiber Bragg grating concept 


The propagation of light in optical waveguides is explained by Maxwell’s equations 
with boundary conditions. The solutions provide the basic field distribution and 
properties of the confined and guided modes. Coupling of specific propagating modes 
can occur (destructively or constructively) if the waveguide has a phase and/or 
amplitude perturbation that is periodic, with an associated wave-guiding pertur- 
bation “phase/amplitude-constant” that is close to the difference or sum between 
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Fig. 4. FBG schematic diagram, showing refractive index modulation and the resultant reflection 
and transmission spectra. 


the propagation constants of the modes. Thus, a FBG of a constant (or not — in 
the most complex cases) refractive index modulation (An) and period (A) has a 
direct effect in the forward- and counter-propagating mode coupling properties in 
a waveguide for the same mode type (the fundamental mode in a SM waveguide 
typically) and at a particular center wavelength, called the Bragg wavelength (A ,), 
as sketched in Fig. 4. The maximum change in the refractive index (An) that can 
be induced in an optical fiber is around 107° of the refractive index. This value 
may not seem very large, but it is sufficient to provide 100% reflection of light after 
only a few millimeters of fiber grating. This is not unsurprising when one considers 
that a few millimeters corresponds to many thousands of periods of the grating. In 
a typical grating, the modulation is much less than 10~* and gratings are a few 
centimeters long, with more complex amplitude modulation. 


4.2. FBG fabrication 


FBGs are fabricated by “inscribing” or writing systematic periodic (or aperiodic) 
variations in refractive index into the core of a photosensitive fiber — typically 
Ge-doped or hydrogenated — using an intense ultraviolet (UV) laser source, typ- 
ically a 244nm (frequency-doubled argon 488nm) laser. The refractive index of 
the core changes with exposure to UV light, with the amount of the refractive 
index change being a function of the intensity and duration of the exposure. The 
dominant processes used are the “phase mask (PM) interferometer” and the “direct 
mask interferometer”. The preferred method depends on the type of grating to be 
manufactured. 

The direct mask interferometer method is the most common and simplest one. 
It utilizes a diffractive optical element called the PM to spatially modulate the UV 
light (see Fig. 5(a)). To the eye, this looks like ruled glass. The PM is a surface relief 
grating used in transmission, analogous to the volume phase holographic grating. 
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(a) (b) PM interferometer method 


Laser Beam 


Direct mask method 


Cylindrical Lens 


Laser Beam 


Cylindrical Lens 
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Fig. 5. Schematic diagram of FBG fabrication methods. (a) PM direct writing method. (b) Mach— 
Zehnder interference method. 


The PM is placed between the UV laser source and the photosensitive fiber. The 
spatial diffraction pattern created by the PM determines directly the grating struc- 
ture along the fiber. This method gives Bragg wavelengths dependent on the PM 
properties and periodicity. 

The second method, the PM interference method, uses two-beam interference 
in a Sagnac or Mach-Zehnder configuration (see Fig. 5(b)). This was first used 
to generate uniform gratings, and provides a range of possible Bragg wavelengths 
independently of the PM properties. Here, the UV laser is split into two beams 
that interfere at the core of the photosensitive fiber. The interfering beams create 
a periodic intensity distribution along the fiber. The index of refraction of the 
fiber core changes according to the intensity of UV light and exposure time. The 
interference period of the two beams can be modified by changing the incident angle 
of the beams with respect to each other. This method allows a quick and easy change 
of the Bragg wavelength. 

As explained before, a FBG is a resonant optical device that relies on the 
SM behavior of fibers and waveguides by interfering propagating and counter- 
propagating modes in the fiber. In reality, even though a simple concept, the fab- 
rication of FBGs with controlled and desired properties requires highly technical 
and complex methods and apparatus. In most cases, a single static exposure to 
an intense UV interference pattern of a SM fiber core will generate a generic filter 
response with unwanted features such as side lobes, broad spectrum and weaker 
strength. Many other techniques such as UV interference amplitude modulation 
(apodization), PM dithering techniques and asynchronous speed writing stages are 
normally used to create usable narrow and deep complex filters for astronomy.*4 
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4.3. FBGs for OH suppression 
4.3.1. Single-mode FBG 


FBGs were employed for OH suppression in the GNOSIS demonstrator used on-sky 
at the Anglo-Australian Telescope (AAT). GNOSIS coupled the telescope seeing- 
limited point spread function into seven MM fibers, each of which fed a series of 
19 mode photonic lanterns. In each single-mode arm of each photonic lantern, the 
FBG filters were installed. Another bank of seven photonic lanterns were used to 
couple the single modes in a seven-fiber MM slit, which was relayed into the IRIS2 
near-infrared spectrograph for dispersion. 

GNOSIS utilized two complex, multi-notch aperiodic FBGs®° in series to sup- 
press OH lines over two-thirds of the H band (1.47—1.7 wm). The FBGs were designed 
to suppress the 103 brightest OH doublets, using the line positions and strengths 
from Ref. 36. The FBGs were printed in a custom photosensitive single-mode fiber, 
and were packaged in athermal stainless steel tubes. The notch widths were designed 
such that thermal variations in the notch centers would maintain overlap with the 
corresponding emission line doublet. 

The GNOSIS tests?” 3° demonstrated high throughput and excellent suppres- 
sion of the OH skylines, but produced no significant reduction in the interline back- 
ground. It was unclear whether the lack of reduction in the interline background 
was due to physical sources or systematic errors, as the observations were detector 
noise dominated. For this reason, a dedicated spectrograph, called PRAXIS,°° is 
now under construction for the next phase tests. 


4.3.2. Multi-core FBG 


The large number of modes contained within a seeing-limited point spread function 
of a typical astronomical telescope leads to a number of issues for the use of SM 
FBGs configured as in the GNOSIS experiment. Each SM fiber must be athermally 
packaged and therefore the system does not scale well, as it becomes problematic to 
package for very large numbers of cores. It has thus been proposed to inscribe FBGs 
simultaneously within multiple cores of a multi-core fiber.4°:4! In a multi-core fiber, 
a single cladding region contains many individual cores — up to several hundreds. 
Each multi-core region can be tapered at the ends (as in a lantern) for a multi-mode 
transition. 

The fabrication of MCFBGs presents significant challenges. The FBGs written 
into each core must be extremely uniform. Any deviation in the FBG pattern will 
result in a modification of the notch center wavelength and hence cause a broadening 
of the notch for that fiber. Such deviations result from the non-uniform illumination 
of the interference pattern across different fiber cores during the writing process. 
Figure 6 illustrates an example of the problems that have been encountered in early 
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0.3 


Fig. 6. Example of FBG written in multi-core fiber showing depth (a) and center wavelength (b) 
variation across individual cores. Used with permission from Ref. 40. 
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Fig. 7. Example of point-by-point waveguide Bragg grating. (a) shows waveguide cross-sections, 
(b) and (c) show microscope images of the waveguides from the front and top. Used with permission 
from Ref. 43. 


devices. Several techniques to improve FBG uniformity in multi-core fibers are now 
being investigated, such as encasing the fibers in capillary tubes with a single flat 
side polish (to avoid self-focusing issues) and incorporating different interferometer 
set-ups to improve the depth of field of the writing region. 


4.3.3. Point-by-point waveguide Bragg grating 


The approach to fabrication is by using the same femtosecond laser that writes 
the waveguide itself.4? Waveguide Bragg gratings have been demonstrated using a 
single axial point-by-point technique or with a square-wave modulated pulse train. 
Waveguide Bragg gratings have also been demonstrated in a device integrated with a 
direct laser written photonic lantern,*? as illustrated in Fig. 7. Current limitations 
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prevent long complex filters due to losses within the waveguides. However, this 
technique has advantages to the FBG technique because the integrated devices are 
inherently robust, can be flexible as to the arrangement of the waveguides, and can 
be miniaturized. 


5. Calibrators 


5.1. Photonic combs 


Photonic combs came to the attention in astronomy a decade ago in the context 
of improving on the iodine cell method for precision radial velocities to assist with 
the search for exoplanets. In an iodine cell, the imprint of the iodine absorption 
spectrum, with its accurately calibrated atomic spectrum, overlays the stellar light. 
The first realizations of laser combs were unwieldy, unstable and expensive — big 
powerful lasers were fed into large etalon cavities in order to remove lines from the 
densely bunched optical frequency comb.'! Compact ring resonators“ and in-fiber 
etalons*® * have also been discussed as possible calibrators for astronomy, although 
results to date are still preliminary. 

We refer to photonic combs collectively as all devices that seek to produce a 
periodic output in frequency for the purposes of instrument calibration, both in 
terms of wavelength calibration and PSF mapping. This includes laser combs, fiber 
etalon combs, ring resonator combs and so on. There are many ways to achieve this 
sort of output, as described in recent reviews.*” We reserve the term “frequency 
comb” for locking and stability, e.g., radio/optical frequency mixing, which is a key 
aspect of photonic combs being locked to a local oscillator or standard. This has 
been brilliantly exploited by atomic physicists and led to the 2005 Nobel Prize being 
awarded to Hansch and Hall. 

A demonstration of the power of an in-fiber etalon photonic comb in removing 
the systematic aberrations of a wide-field spectrograph is given in Ref. 48. By anal- 
ogy with the iodine cell, there is a case for an in-fiber etalon that produces narrow 
notches in absorption, particularly if the notches achieve >30 dB suppression — 
e.g., an ultra-stable multi-notch FBG. Such a response can be used to define the 
true spectrophotometric baseline of the instrument in the presence of scattered light. 
There is also a demand for an extremely stable spectrophotometric standard whose 
shape can be “tuned” to the celestial source under study. Both constitute unsolved 
problems at the present time which we anticipate can be resolved using photonic 
mechanisms. 


5.2. Mode scramblers 


One of the real challenges involved in precision spectroscopy (e.g., HARPS at the 
ESO 3.6 m telescope) is ensuring a perfectly stable output Gaussian beam from a 
fiber in the presence of image motion at the input face. Even without image motion, 
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fibers produce a wavelength-dependent speckle (granular) pattern resulting from 
interference of the many propagating modes with different relative phases. This is 
not a problem with SMFs, which allow only one fundamental mode of propagation, 
a key reason why these are now seriously studied as the input to high-precision 
radial velocity spectrographs.'” 

For a few-mode or MM fiber, however, this presents a real challenge, particularly 
for radial modes. MM fibers scramble in azimuth fairly efficiently, but this is not 
true for the radial modes. Spreading the power evenly across guided modes in radius 
is very challenging, particularly if the goal is to conserve the beam properties of the 
fiber and not to lose light through focal ratio degradation. Reference 24 reviews 
different fiber geometries that achieve mode scrambling, e.g., fibers with D-shaped 
cross-sections, inspired by chaos theory, which outperform polygonal fibers rather 
well (Fig. 8). 

Photonic fibers, and in particular photonic lanterns, have also been studied as 
possible next generation photonic scramblers.*°*° In principle, the input transition 
of photonic lanterns effectively samples the input MM fiber pattern spatially to 
deliver it into the SM cores of the lantern. Then the light propagates independently 
in the SM cores, inevitably experiencing different phase changes due to bends and 
dissimilarities between the SM cores. This couples light between the degenerate 
supermodes and hence the modes of the output MM fiber, effectively scrambling 
the modal amplitudes as well as their phases. A suitable perturbation should then 
yield an incoherent and uniform mode spectrum. 


contrast: 0.9592 contrast: 0.9519 contrast: 0.9130 
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Fig. 8. Speckle images before (a) and after (b) agitation for circular, octagonal, and D-shaped 
fibers. When agitated, the octagonal and D-shaped fibers have the lowest speckle contrast, which 
is good for modal suppression. Used with permission from Ref. 24. 
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6. Integrated Photonic Devices 


In recent years, we have begun to feed light efficiently from large telescopes into 
photonic devices via photonic lanterns and/or AO systems. This opens up a largely 
unchartered landscape rich with possibilities. Integrated photonic devices have the 
additional advantage of mass replication, low power requirements, robustness, small 
footprint and minimal weight. We can exploit all of these advantages in future 
instrument concepts. 


6.1. Integrated device concept 


Planar optical waveguides and ultrafast laser-written waveguides are key devices 
to construct integrated optical systems with complex functionalities. Integrated 
optical waveguides consist of a square, rectangular or round core surrounded by a 
cladding with lower refractive index. Although a rigorous understanding of their 
wave-guiding mechanism requires a more complex 3D boundary analysis compared 
to their symmetric optical fiber counterparts, the underlying modal behavior and 
guiding mechanisms are the same. Conventionally, integrated waveguides, and espe- 
cially planar waveguides, have been used as SM waveguides to exploit their photonic 
functionalities, such as robust confinement, light splitting and interference proper- 
ties. MM planar waveguides have also been studied as MM interference waveguides 
for splitting and combining functions based on the concept of self-imaging.°? Fur- 
thermore, complex devices such as array waveguide gratings are a mature technology 
in the telecom industry that is now been studied as integrated spectrographs, as 
will be discussed in Section. 6.7. 


6.2. Integrated device fabrication 


Due to the very high precision required, planar devices are very challenging and 
demanding to construct, as very high precision dimensional control is required. 
However, the telecom and semiconductor fabrication industries have driven many 
technological advances. Nowadays, silicon-on-insulator, silica-on-silicon and silicon 
nitrate (Si3N,4) are the most mature technologies; however, the need to push to more 
exotic wavelengths such as mid-infrared has led to recent developments in other type 
of materials such as chalcogenide (e.g., gallium lanthanum sulfide, among others) 
and soft glasses (e.g., tellurite), which are highly transparent in the near to mid- 
infrared (2-10 zm). 

Fabrication techniques of planar waveguides are very dependent on the materials 
used, and materials are dependent on the required operating wavelength. However, 
the most widespread techniques are lithography, mechanical and chemical evap- 
oration and ion beam etching. There are many steps in the fabrication of planar 
waveguides;°! some of these are: vacuum evaporation, ion sputtering, chemical vapor 
deposition, flame hydrolysis deposition, ion implantation, and epitaxial layering. 
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The second type of integrated waveguides are laser-written waveguides.°” °3 


Researchers have previously fabricated highly reliable and low cost integrated opti- 
cal devices such as couplers, waveguides and FBGs in various materials like glasses, 
crystals and polymers, using the laser direct-write technique, for applications in vari- 
ous fields of optics. Laser—material interaction can happen in several ways depending 
on the absorption characteristics of the material and the intensity, pulse duration 
and wavelength or frequency of the laser. In dielectric materials like glasses, there 
are two different laser direct-writing regimes. At low repetition rates, every single 
pulse creates a permanent material change, while at high repetition rates cumulative 
heating occurs. 


6.3. Integrated beam combiners 


Many of the technologies described previously have found their way onto telescopes 
as technological demonstrators; there is a strong history of examples of photonic 
waveguide applications in astronomical interferometry that have resulted in science 
results and even facility instrumentation. In the field of long-baseline interferometry 
the beams from multiple telescopes are combined interferometrically in the single- 
mode regime. 

The first example was FLUOR, which used fiber couplers to join two small 
aperture telescopes at Kitt Peak.°* Later the technique was refined and expanded 
in wavelength space with the IONIC experiment on the IOTA interferometer. Real 
breakthroughs were then obtained by moving from fiber couplers to fully photonic 
beam combiners, using planar technology in the K band (2.2 um). In the PIONIER 
and GRAVITY? instruments, all four of the VLT telescopes were combined on a 
single chip (see Fig. 9) via a quadrature method, allowing visibility amplitudes and 
closure phases to be derived. 

Other efforts have focused on using 3D waveguide beam combiners fabricated 
via direct laser writing. In this architecture, operation can be pushed into the visible 
waveband, and a larger number of telescope combinations is possible than with the 
linear approach.°® Further work has examined the potential for beam combination 
at longer wavelengths in the mid-infrared (e.g., Ref. 57). 


Fig. 9. Integrated photonic beam combiner for GRAVITY. Used with permission from Ref. 55. 
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6.4. Integrated aperture masking 


Aperture masking is a powerful interferometric technique that aims to provide infor- 
mation on angular scales around \/D from a target star. This is achieved by record- 
ing the interferograms generated by placing a non-redundant sparse-aperture mask 
in a re-imaged telescope pupil-plane.°® While the bulk-optic version of an aperture 
mask interferometer has had many successes, there are several limitations that have 
motivated the development of photonic systems. 

The FIRST experiment employed optical fibers to sample the telescope pupil 
and reformat it for interference.°? In the Dragonfly experiment, a direct laser- 
written reformatter (Fig. 10) is used.'° In both cases, the output can be fed into 
a (lithographically-fabricated) photonic chip beam combiner, or alternatively into 
a non-redundant output array configuration, for recombination in a Fizeau image 
plane. 

The key advantage in both of these instruments is that the complete pupil plane 
can be sampled, as the requirement for non-redundancy in the baseline length can 
be shifted to the output rather than the input. This leads to a significant increase in 
the sensitivity of this approach over the sparse aperture-sampled bulk optic system. 
Additionally, system stability is enhanced because of the spatial filtering that occurs 
during transmission through the single-mode guides. 

Thus far, photonic aperture mask systems have been demonstrated in the lab- 
oratory and on-telescope at Subaru, the AAT, and the Shane 3-m telescope at 
Lick Observatory. Further contributions to the science of high-contrast companions 
through this technique awaits further development of the technology using state- 
of-the-art detectors with fully optimized systems, and potentially pushing into the 
mid-infrared where further gains are realizable from improved contrast ratios. 


Output 


Fig. 10. 3D paths taken by laser written waveguides in the Dragonfly pupil re-mapper with two 
of the waveguides being illuminated. From Ref. 10. 
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6.5. Integrated nullers 


In a nulling interferometer the light from two telescopes is combined destructively 
such that one beam is half-wave phase-shifted relative to the other.” This effectively 
causes the central on-axis starlight to be nulled, leaving behind any off-axis planet 
signal. The technique can be applied equally to the pupils of two different telescopes 
or to sub-apertures of a single telescope. 

Techniques have been proposed and tested for nulling using bulk optics, such as 
on the Keck nulling interferometer. Fibers and integrated nullers offer many advan- 
tages that are particularly related to the required depth, stability, and precision 
of the null.2° Early tests implemented mid-infrared fibers. Over the last decade 
a number of configurations and a variety of integrated platforms have been pro- 
posed and/or tested at wavelengths from the near to mid-infrared. These include 
reverse Y-junctions using silica on silicon, Ag+ ion exchange on silica tricouplers, 
active lithium niobate beam combiners, etched chalcogenide-on-chalcogenide cou- 
plers, planar chalcogenide multimode interference couplers, closure phase nullers, 
and laser-written 50/50 couplers in silica (e.g., Refs. 60-62). 


6.6. Ring resonators 


A ring resonator is a looped waveguide that is coupled to both an input and output 
waveguide. The loop creates a resonance comprising a series of wavelengths that 
can be either filtered or selected from a broadband input. Both of these properties 
are useful for astronomical applications. While ring resonators have been developed 
and extensively used in the telecommunications and photonics industries, they are 
yet to be proven in an astronomical context. However, there are several applications 
that have motivated some development. 

When used as a wavelength filter, the ring resonator can act to suppress 
unwanted lines, for example; this is an alternative approach to fiber or waveguide 
Bragg gratings for atmospheric OH suppression. When used as a wavelength selec- 
tor, the ring resonator can provide a frequency comb that can be used to provide 
accurate wavelength calibration for high resolution spectroscopy. 

Current devices show good control over the free spectral range and wavelength 
separation of multi-ring devices. However, significant effort is required to achieve 
efficient coupling into such devices from astronomical telescopes because of the 
necessity for using small waveguide diameters on chip.®* 


6.7. Integrated spectrographs 


An integrated photonic spectrograph is a miniaturized, monolithic device incor- 
porating planar waveguide geometries to create dispersion on a chip. The typical 
size of these devices (a few cm) and their integrated nature makes them robust 
against misalignments due to environmental factors such as temperature, pressure, 


>See Chapter 5 of Volume 3 of this Handbook. 
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and flexure seen in bulk optic spectrographs. This is of benefit for high precision 
observations at high spectral resolution. An additional benefit of these devices is 
that, although the initial unit cost is high due to significant development costs, 
once a production unit is made it can be readily mass fabricated in large quantities, 
making this device ideal for highly multiplexed spectroscopy.'* 1° 

The use of the integrated spectrograph in astronomy was first proposed by 
Ref. 65 and was first demonstrated on-telescope by Ref. 66. Observations so far have 
used modified commercial arrayed-waveguide-grating devices developed for use in 
the telecommunications industry. Apart from tailoring the specifications for astro- 
nomical science applications, there are a number of issues that must be overcome 
before this technology can be used routinely in astronomy. The arrayed-waveguide 
grating device is inherently linear, producing a dispersion in 1D for a single SM fiber 
input. In order to allow highly multiplexed SM inputs, a single fiber per chip solution 
requires a large stack of chips and a custom detector array geometry; alternatively, 
a multi-fiber per chip solution requires a form of integrated or miniature cross 
dispersion at the chip’s output. There are several alternatives proposed to arrayed- 
waveguide gratings that also use diffractive elements’ such as planar double grating 
spectrometers, super compact diffractive imaging spectrometers, or photonic crystal 
spectrometers. 

Another approach to integrated spectroscopy is using Fourier-transform spec- 
trometry. The SWIFTS approach®® is based on the direct detection of a stand- 
ing wave resulting from interference within the device (Fig. 11). A very compact 
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Fig. 11. Schematic showing concept for the Stationary Wave Integrated Fourier Transform Spec- 
trometer. Direct near-field detection of confined standing waves is achieved with two possible 
configurations: from a guided mode reflection (a), or interference of two counter-propagating modes 
(b). The inset (c) shows detection with nanodetectors. Used with permission from Ref. 68. 
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1D spectrometer was demonstrated using optical nanoprobes to detect the evanes- 
cent field. In principle, very high spectral resolution is possible with this technique. 


7. Conclusions 


The breadth of application of photonic fibers and waveguides in the field of astro- 
nomical instrumentation is very large. It covers not only many techniques and tech- 
nologies, but also has many science drivers: from exoplanet detection, imaging, and 
characterization through nulling interferometry or enhanced precision spectroscopy, 
through galaxy evolution studies using MM hexabundle integral-field-units, to stud- 
ies of the distant universe enabled by deep near infrared OH-suppressed observa- 
tions. In the coming decades, as emerging photonic fields reach further maturity, it 
can be expected to see a regular sequence of new techniques transitioning through 
lab demonstration, on-telescope demonstration, and onto new science results. 
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After a summary of the general requirements needed for the use of tip/tilt mir- 
rors (TTMs) in astronomy, we present two main tip/tilt technologies, which are 
the piezo technology and the voice coil technology, in terms of architecture and 
principal specifications. A significant operational example is given to illustrate 
each case and to show the relevance of the chosen technology. 


1. Introduction 


“The simplest form of adaptive optics is tip/tilt correction, which corresponds to 
correction of the tilts of the wavefront in two dimensions (equivalent to correction of 
the position offsets for the image). This is performed using a rapidly moving tip/tilt 
mirror (TTM) that makes small rotations around two of its axes.”* 

Because of internal and external distortions, terrestrial telescopes are not able 
to deliver the image quality that would be theoretically possible. The encountered 
deviation from the optimum beam path can result from atmospheric turbulence, 
telescope tracking, which is done to compensate the motion of the object and 
Earth rotation relative to the sky background, distortion of the telescope structure 
because of gravity, thermomechanical effects or wind pressure, and shake effects. 
Positioning systems such as hexapods are used in the active optics to compensate 
the high amplitude and low frequency part of these effects while TTMs are used 
in the adaptive optics (AOs) to correct their lower amplitude and higher frequency 
part. 


@Wikipedia contributors, “Adaptive Optics”, Wikipedia, The Free Encylopedia, https://en. 
wikipedia.org/wiki/Adaptive_optics, accessed 24 April 2019. 
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In the following sections, we first summarize the main requirements needed for 
TTMs in astronomical instrumentation (Section 2). Among the different possible 
technologies available for the manufacturing of a TTM, two main families, which are 
piezo technology (Section 3) and voice coil technology (Section 4), can be selected 
as references. These technologies are presented in terms of architecture and general 
properties. An operational system is presented as an example for each case. 


2. Tip/Tilt Mirror Requirements 


This section summarizes the main specifications required for the TTM in an AO 
system. The general need is to correct the atmospheric tip and tilt perturbation, 
the residual telescope tracking error, the telescope vibration effects caused by wind 
shake and the other mechanical vibrations, which all are rapid perturbations. 


e The angular stroke 0 is obtained from the stroke 6 (i.e., the mechanical amplitude 
of the optical surface deformation in tip and tilt) of these two modes divided by 
the considered pupil diameter P: 


é=—. 1 
. (1) 
6 includes datm, the stroke required for the atmospheric tip/tilt correction, and 
dmech, the stroke required for mechanical tip/tilt corrections (residual tracking 
error, wind shaking and other vibrations). The atmospheric stroke is given byt? 


bsam = VIA (2)" | () 


TO 


where | = 0.896 is the constant related to both tip and tilt modes, is the 
wavelength, D is the diameter of the telescope and ro is the Fried parameter of 
the considered turbulence.” 79 is given by 


0.423 (2n\” _— 
: T 
ro = — Cn?(h)dh , 3 
(2 (2) / was) 2) 
where ¥ is the angle between the line of sight of the telescope and the zenith and 
C? is the index of refraction structure constant, which characterizes the turbulent 
spatial fluctuations at the altitude h. Note that ro being proportional to A°/®, 
the tip/tilt stroke datm is achromatic. 

Generally, the atmospheric tip/tilt correction requires a value of datm up to 
tens of microns peak-to-valley.? The mechanical tip/tilt corrections require values 


that can be an order of magnitude higher; i.e., dmech can reach in some cases up 
to hundreds of microns peak-to-valley. In practice, these values divided by the 
considered pupil diameters, which goes from tens to hundreds of mm at the AO 
system level, lead to angular strokes in the range of 0.5-5 mrad peak-to-valley. 
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e The —3 dB closed-loop bandwidth f. of the TTM (assumed a first-order filter) 
determines the bandwidth angular error of”. This jitter error is due to the finite 
bandwidth of the system control loop. At the AO system level, it is given by 
Ref. 4: 


% = |=" Dp (4) 


where f;/; is a characteristic frequency of the atmospheric turbulence® defined by 


_ Vt /t 
fist = 0.08 575 pays 


. (5) 
where 1%4/; is the effective wind speed for tilt correction. 

Considering a diffraction-limited correction, i.e., op” < \/P, it is clear from 
Eq. (4) that the closed-loop bandwidth, f., must be about 10 times higher than 
the characteristic frequency, f,/:. Note that because ro is proportional to A°/> 
(Eq. (3)), frye is inversely proportional to the wavelength. In practice, f,/, can 
reach tens of Hz, which requires f, to be up to hundreds of Hz (particularly for 
applications in the visible band). 

e Again, considering a diffraction-limited correction, the angular resolution of the 
TTM must satisfy 0f°° < A/P. Therefore, an angular resolution about 10 times 
smaller than the diffraction limit is needed. According to the considered wave- 
lengths and AO pupil diameters, the required angular resolution can go from 
10 prad down to a few tens of nrad. 

e The pupil diameter P of the TTM is usually that of the AO system (or close to 
it). For post-focal systems, the pupil diameter usually ranges from several tens 
to several hundreds of millimeters. 

e Regarding the architecture, it must be noted that the possibility for a tip/tilt 
platform to hold a deformable mirror in place of a flat mirror is a significant 
advantage, since in this case (i) it saves one optical surface, and (ii) it allows 
the entire wavefront correction in the same pupil plane, which can be properly 
conjugated to the telescope pupil. 

e Still regarding the architecture, the mechanical mounting design must minimize 
the piston excitation, especially for mirrors used in interferometry. 

e The thermomechanical sensitivity of the TTM must be low enough to allow good 
operation in various environmental conditions. 

e Robustness and reliability of the TTM must be high enough to keep the instru- 
ment in operation for the required time. 


It can be noted that, in some cases, the TTM can provide a medium-bandwidth 
high-amplitude correction, while the deformable mirror compensates for the residual 
high-frequency tip/tilt errors and also for the possible dynamic deformations of the 
TTM. 
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3. Piezo Technology 


3.1. General architecture 


The piezo technology uses piezo actuators integrated into a kinematic flexure guid- 
ing system (Fig. 1). The platform can be driven by three or four piezo actuators. 
These actuators are used as “position” actuators. 

Direct-measuring, non-contact capacitive position sensors are used for the servo 
loop. 

In addition to tilting, the platform can also be operated linearly in the Z direc- 
tion, which can be used, for example, for correcting optical path lengths (i.e., the 
piston mode). 


3.2. General properties 
This technology has the following main features: 


e Angular stroke up to 5 mrad peak to valley. 
e Accurate angular resolution, down to 20 nrad. 


I 
\ i] 
\ I 
Ld 
br 
VT 
\/ 
vi) 
ca 


7 
4 “y 


y 7 
ey, 
Y 4% U @ 4% 


Fig. 1. General principle of piezo tip/tilt technology. PZT stands for lead zirconate titanate 
(piezoelectric ceramic material). For a discussion of the basic principles of a piezo tip/tilt 
system, see: Physik Instrumente (PI), General Technical Documentation on Z-Tip-Tilt Piezo 
Platforms, https: //www.pi-usa.us /en /tech-blog /design-performance-and-tuning-of-fast-steering- 
mirrors-based-on-piezo-drives-and-flexure-guides/. Figure courtesy Lazar Buntic, PSU. 
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Fig. 2. Two commercial piezo tip/tilt platforms. Courtesy of PI (Physik Instrumente), www. 
pi.ws. 


e High bandwidth, up to several hundreds of Hz. 

e Reduced size with relatively limited supported mass (generally <3 kg). 
e Good reliability. 

e Relatively low cost (these are commercial products) (see Fig. 2). 


3.3. An example of tip/tilt piezo technology: The GPI “woofer” 


The Gemini Planet Imager (GPI) system of the Gemini Telescope is an 
extreme adaptive-optics imaging polarimeter/integral-field spectrometer.” It pro- 
vides diffraction-limited data between 0.9 and 2.4 microns and contrast ratios of 
10° on stellar companions at separations of 0.2-1 arcseconds. The science instru- 
ment provides spectroscopy or dual-beam polarimetry of any object in the field of 
view. 

This instrument includes an adaptive optics system with two deformable mir- 
rors. One is the “tweeter” for high-order and low-amplitude wavefront correction 
and the other one is the “woofer” for low-order and high-amplitude correction. This 
last deformable mirror is mounted on a tip/tilt platform, which offers the advantage 
to allow the entire low spatial-frequency wavefront correction in the same plane (see 
Fig. 3). 

The GPI Woofer-T/T Platform System is composed of a CILAS-specific 50 mm 
aperture, 97 actuator stacked array mirror mounted on a Physik Instrumente (PI) 
commercial tip/tilt platform (P-528.TCD model).® A custom mechanical mount is 
used to interface with the GPI optical bench (see Fig. 4). 

The system offers the following specifications: 


e Angular stroke: 2 mrad peak to valley. 
e Angular resolution: 60 nrad. 
e Closed-loop bandwidth: 190 Hz at —3 dB (see Fig. 5). 
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Fig. 3. (a) GPI instrument during test at University of California, Santa Cruz. Following the 
beam path, we see among the other components, the two DMs: the “tweeter” and the “woofer”, 
which latter is mounted on its tip/tilt platform. (b) The “woofer” - tip/tilt platform system alone. 


(b) 


Tip/tilt platform 


| Back side of the 
7 deformable mirror 


| ° 


| | 
| 


Mechanical mount 


Fig. 4. (a) The drawing of the inside parts of the woofer-tip/tilt platform assembly. The actuator 
array of the deformable mirror passes through the central square hole of the tip/tilt platform. 
This allows a minimization of the unbalancing of the moving mass. (b) The Physik Instrumente 
commercial tip-tilt platform. Courtesy of PI (Physik Instrumente), www.pi.ws. 
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Fig. 5. Tilt frequency response of the “woofer”-tip/tilt platform shown in Fig. 4, which shows 
190 Hz closed-loop bandwidth at —3 dB. The strong attenuation of the resonance is achieved by 
a Notch filter. The mass of the supported deformable mirror is 1.2 kg. 
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4. Voice Coil Technology 


4.1. General architecture 


This technology comes from an Observatoire de Paris concept and consists of build- 
ing a mechanical gimbal mount around a flat mirror or a deformable mirror (Fig. 6). 
(Use of a deformable mirror saves an optical surface and allows a proper system 
pupil conjugation by locating deformable mirror and TTM in the same plane.) 

The mirror is mounted inside the central ring. The gimbal mount provides 
two perpendicular rotation axes. A pair of flexible pivots for each axis defines the 
rotation axis position. The tilt rotation axis is located at the center of gravity of the 
moving part, which is very close to the center of gravity of the mirror. This allows low 
power dissipation and good balancing of the system. Two voice coil linear actuators 
(generally arranged in push/pull pairs) and two position sensors are implemented 
on each axis for the position servo control.” The voice coil actuators are used as 
“force” actuators. 


4.2. General properties 


This technology shows the following main features: 


e The general structure allows the mounting of any mirror, including any 
deformable mirror technology, and can be extrapolated to significant masses and 
dimensions. 


TT Chassis . FEXIBLE PIVOT 


Fig. 6. General principle of voice coil tip/tilt technology. Figure credit: Pierre Gigan, Observatoire 
de Paris, used with permission. 


>LObservatoire de Paris - LESIA, Les Tip-Tilt et Image Stabilizer fabriqués au LESIA (2008), 
http://lesia.obspm. fr /Les-Tip- Tilt- et-Image- Stabilizer. html 
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(a) 


Fig. 7. Two customized voice coil tip/tilt mounts, both supporting a bimorph deformable 
mirror (photo credit: LESIA, Observatoire de Paris-PSL (footnote b)®). (a) Tip/tilt mount 
of the Spectrograph for INtegral Field Observations in the Near Infrared (SINFONI) of 
the European Southern Observatory Very Large Telescope (ESO VLT; http://lesia.obspm.fr/ 
Projet- MACAO-montures-pour-le-VLTI.html). (b) Tip/tilt mount of the Adaptive Optics Sys- 
tem with 188 Elements (AO188; http://lesia.obspm.fr/Etude-et-Realisation-d-une-monture.html, 
photo credit: LESIA, Observatoire de Paris-PSL) of the Subaru Telescope operated by the National 
Astronomical Observatory of Japan (NAOJ). 


Large angular stroke up to tens of mrad peak to valley. 

Accurate angular resolution, down to less than 40 nrad. 

High bandwidth, up to 1 kHz for light supported mirror. 
Excellent reliability. 

Negligible temperature dependence. 

Relatively high cost (these are customized products) (see Fig. 7). 


4.3. An example of voice coil technology: The TMT Tip/Tilt Stage 


The Narrow Field Infra-Red AO System (NFIRAOS) is the first-light facility AO 
system for the Thirty Meter Telescope (TMT). This system will provide diffraction- 
limited performance in the J, H, and K bands over 10-30 arcsec diameter fields with 
50% sky coverage at the galactic pole. NFIRAOS is an order 63 x 63 system with two 
deformable mirrors optically conjugated to 0km and 11km. Very low background 
is an important design driver; therefore, one deformable mirror is mounted on a 
tip/tilt platform to reduce the number of optical surfaces, and all the optics are 
cooled to —30°C (see Fig. 8).7° 

The TMT TTS will hold a custom 300 mm aperture, 3125-actuator stacked 
array mirror. The architecture of this mount uses the concept of gimbal rings 


mounted on flexible pivots with voice coil actuators and capacitive sensors (see 
Fig. 9).1° 
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(a) Input Window 


(b) 
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Fig. 8. (a) The design of NFIRAOS optomechanical layout, where the two deformable mirrors 
(DMs) are visible.tt The one conjugated to 0km is mounted on the tip/tilt stage (TTS). Note 
that all of the items are contained in a cooled enclosure operated at —30°C to reduce thermal 
background and reduce observing time. (b) The actual TTS manufactured by CILAS, which is 
able to hold the 300mm useful aperture high-order DM. Figures used with permission. 


X axis DM electrical wires (partial view) 
: Flexible blade 


Flexible pivot Connectors 


Gimbals 


Chassis 
Voice coil actuator + capacitive sensor 


(a) (b) (c) 


Fig. 9. Design of the TMT TTS. (a) The chassis with the two coplanar tip and tilt axes indicated. 
(b) The DM inside the gimbals. The interface between the mirror and the central ring uses flexible 
blades to compensate for the differential thermomechanical deformation between the mirror and 
the ring, while keeping stiffness in Z direction. (c) The system includes two voice coils and two 
capacitive sensors working in push/pull operation, per axis. The flexible pivots guarantee a perfect 
positioning of the axes with no mechanical play and a high stiffness. The electrical wires needed 
to drive the DM exit from its back side to the connectors. 


This system offers the following specifications at both ambient and —35°C 
temperatures: 


e Angular stroke: 0.5 mrad peak to valley. 
e Angular resolution: <40 nrad. 
e Closed-loop bandwidth: 90 Hz at —3 dB (see Fig. 10). 
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Fig. 10. Tilt frequency response of the TMT TTS, which shows 90 Hz closed-loop bandwidth at 
—3 dB. A numerical corrector has been used to control the natural mechanical modes of the mount: 
A dumping loop on speed and a predictive functional control (PFC) on position. The supported 
mass for this measurement was 32 kg. 


5. Conclusion 


The two presented technologies are both well adapted to the use of tip/tilt high 
speed correction needed in astronomical instrumentation. The piezo technology can 
be used for small to medium pupil diameters while the voice coil technology is 
needed for larger pupil diameters and heavy loads. 
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After a summary of the general requirements needed for deformable mirrors 
in astronomy, we present the stacked array mirror (SAM) technology in terms 
of architecture and principal specifications. This is done through an analytical 
approach, which allows an identification of the main technical advantages of the 
technology, which are: high order, large stroke, fast response and high optical 
quality. Furthermore, the design of these deformable mirrors permits a great 
adaptability to various needs for night and solar astronomy, including those for 
very large telescopes and future extremely large telescopes (ELTs). 


1. Introduction 


Stacked array mirrors (SAMs) surely represent the ultimate evolution of the devel- 
opments of deformable mirrors (DMs) for adaptive optics (AOs) systems, first made 
in the 1970s for defense applications. 

The concept uses ferroelectric (piezoelectric or electrostrictive) materials and 
is able to deliver high strength, high accuracy, fast response time and low power 
dissipation for high spatial order DMs. These features allow the technology to be 
widely used in AO for astronomy. 

In the following sections, we first recall the main requirements needed for a DM 
in astronomical instrumentation (Section 2). Then, after a general description of 
the technology (Section 3), the main specifications of SAMs are detailed according 
to the main parameters of their design (Section 4). We use an analytical approach 
in order to help the reader to understand the advantages of this technology. 
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2. Deformable Mirror Requirements 


This section recalls the main drivers relative to the use of deformable mirrors in 
astronomy. These parameters are otherwise accurately detailed in the literature.! © 
We only give hereafter a brief presentation of these characteristics in order to evalu- 
ate their orders of magnitude for comparison to the capabilities of SAM technology 


(see Section 4): 


e The number of actuators, N, is derived from the wavefront fitting error (in radi- 
ans) needed for the AO system. This error is due to the finite number of DM 


actuators, and is given by 
5/3 
D 
Cauine =k (2) ne, (1) 
i) 
where D is the diameter of the telescope, k is a factor which depends on the shape 
of the DM influence function, and rg is the Fried parameter of the considered 


turbulence, given by 
—3/5 
0.423 (2r\* fo 
n= (2 (=) / czas) | (2) 


where y is the angle between the line of sight of the telescope and the zenith, 
d is the wavelength, and C? is the index of refraction structure constant, which 
characterizes the turbulent spatial fluctuations at the altitude h. It can be noted 
that for a given fitting error, the required number of actuators is proportional to 
(D/ro)?. 

Considering the large field of applications and instruments in astronomy and 


the related range of requirements on fitting errors, the required number of actua- 
tors goes from tens for “small” telescopes to tens of thousands for future extremely 
large telescopes (ELTs). 

e Similarly, the delay 7 of an AO closed-loop system causes a wavefront error 
referred to as the temporal error (in radians) of the AO system. This error, also 
called the bandwidth error, is due to the finite bandwidth of the system control 


loop, and is given by? 
5/3 
T 
Cuca = (=) ’ (3) 


where 79 is the atmospheric coherence time (also called the Greenwood delay), 
given by 
7 = 0.314-2, (4) 
Vv 
where v is the mean wind speed weighted by the turbulence profile along the line 


of sight of the telescope. It can be noted that for a given temporal error, the 
maximum allowable delay is proportional to ro/v. 
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The delay of the system includes the phase lag induced by the DM. Therefore, 
the DM temporal transfer function must not show any resonance and must exhibit 
a phase lag smaller than 5° within the AO system bandwidth. 

Again, considering the large field of applications and related temporal errors, 
the required AO system bandwidth goes, depending on the case, from 10 Hz up 
to more than 200 Hz (for instance in solar astronomy). 

The mechanical stroke 6 (i.e., the amplitude of the optical surface deformation) 
required for the correction of atmospheric aberrations at low spatial frequencies 


_ or (By) (5) 
"Ir \ ro : 
where / = 1.03 if the DM compensates for the total amount of aberrations and 
1 = 0.134 if it does not have to compensate for the tip-tilt, which is often the case. 
Note that, since ro is proportional to A®/> (Eq. (2)), the DM stroke is achromatic. 


Usually (i.e., not considering deformable secondary mirrors for ELT’s), the 
required peak to valley stroke ranges from 2 zm to 15 wm, depending on the case. 


is given by 


The mechanical interactuator stroke dint (i-e., the local differential amplitude 
of the optical surface deformation) requested for the correction of atmospheric 
aberrations at high spatial frequencies is given by 


\/P 5/6 
dint = 688° (=) ’ (6) 


2r \ro 


where P is the equivalent of the actuator pitch on the telescope primary. 

Generally, the required interactuator stroke ranges from less than 1 wm up to 

3 pum. 
In order to optimize the correction, the residual surface error of the DM must 
be better than the fitting and temporal errors. By its nature, the DM is able to 
correct its own low spatial frequencies defects; however, this part of the spectrum 
must be sufficiently low so as to preserve the stroke dynamics. The residual surface 
error is therefore constituted by the high spatial frequency defects, which cannot 
be corrected by the DM. 

The required mechanical residual surface error, i.e., the best flat surface error, 

goes from several tens of nm RMS down to less than 5 nm RMS (ie., 10 nm 
RMS wavefront error). These values must be obtained once the mirror is driven 
to reach a flat reference surface, which should not require more than 20% of 
the mirror stroke, as usually requested for robustness and operational safety 
reasons. 
The pupil diameter of the DM depends on the number of actuators and/or the 
actuator spacing. On the one hand, this diameter should be as small as possible 
since it determines the overall system size. On the other hand, for wide field sys- 
tems, the magnification must be kept below a given threshold to avoid additional 
aberrations, which requires a minimum DM diameter. 
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Usually, for post focal systems, the possible DM aperture goes from several tens 
to several hundreds of millimeters. 


e Since an AO system is a closed-loop system, nonlinearity and hysteresis are gen- 
erally managed by the closed loop itself. However, it is useful to limit these effects 
to less than 5% in order to avoid spurious effects. 

e The power dissipation of the DM must be low. This is needed to limit additional 
atmospheric distortion due to thermal effects. 

e The thermomechanical sensitivity of the DM must be low enough to allow good 
operation in various environmental conditions. 

e Robustness and reliability of the DM must be high enough to keep the instrument 
in operation for the required time. 


3. Description of the Technology and General Properties 


3.1. Overview 


SAMs use ferroelectric actuators made of stacks of individual plates or disks. These 
actuators are mounted on a rigid base plate and apply deformations to the optical 
surface. The base plate gives its stiffness to the structure (see Fig. 1). 

SAMs generally show the following main features: 


e The actuator pitch is linked to the actuator stroke: the more the stroke, the longer 
the actuator and the larger its section (to maintain the shear stiffness), the larger 
the pitch, in the range of 2 to tens of mm. 

e The use of ceramic materials allows high actuator resonance frequency, above 
10kHz. 

e The design (actuator layout, number of actuators, pupil dimension, etc.) can 
be adapted to a lot of needs, including astronomy (night and solar) and defense 
applications. The number of actuators is in the range of tens to several thousands, 
while the pupil diameter goes from tens to hundreds of mm (see Fig. 2). 


7 Incident beam 


Optical plate 


Stack of plates 


Actuator array 


Base plate 


Fig. 1. General architecture of a SAM. Under voltage, the actuators push or pull the optical plate 
in order to change the optical path of the incident beam. 
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(c) 


Fig. 2. Three examples of actuator arrays. (a) Photograph of the 15 x 15 actuator array of DM 
manufactured by AOA Xinetics (used with permission). (b) Photograph of the 21 x 21 actuator 
array manufactured by CILAS for the DM of the Gran Telescopio Canarias (GTC).” (c) Design 
of the 41 x 41 actuator array of the CILAS DM (with 1377 useful actuators) used on the Spectro- 
Polarimetric High-contrast Exoplanet Research (SPHERE) instrument of the European Southern 
Observatory’s Very Large Telescope (ESO VLT). This DM has been in operation since 2014 on 
the VLT.®9 


3.2. Basic principle 


When applying an electric field to a ferroelectric plate, it is possible to change its 
dimensions. This is a reversible process. Elongation in the direction of the electric 
field is known as the longitudinal effect, while the transverse effect refers to an elon- 
gation perpendicular to the electric field. The longitudinal effect is mainly involved 
for SAMs. 

An element such as that shown in Fig. 3 acts as a capacitive element defined by 
two conductive electrodes enclosing the ferroelectric material as dielectric. Since a 
deformation is created when this capacitor is charged by applying a voltage, it can 
be considered as a “moving capacitor”. 

A stacked actuator is made of a number of ferroelectric elements, as shown 
in Fig. 4. Stacking of these elements into a multi-element structure increases the 
total elongation, i.e., the stroke of the actuator. Note that this stacking can be done 
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Longitudinal effect = 5h dL = Transverse effect 


Fig. 3. Schematic drawing of the shape modification of a ferroelectric element under an electric 
field E: 6h is the thickness change under the longitudinal effect; 6L is the width change under the 
transverse effect. 


4 


¥ 


Fig. 4. Schematic diagram of a stacked actuator (courtesy Piezomechanik and APC International, 
Ltd., which was formerly American Piezo Ceramics, Inc., www.americanpiezo.com). Electrodes are 
localized between each stacked element in order to apply the electrical field for the transverse effect. 
The total elongation is the sum of the elongations (i.e., the thickness variations) of all the elements. 


either by gluing or by co-firing, depending on the different possible technologies and 
materials. 

The ferroelectric material is mostly either piezoelectric or electrostrictive. The 
main characteristics of these properties are presented in the two next sections and 
in Fig. 5. 


3.3. Piezoelectric effect 


The piezoelectric effect is understood as the linear electromechanical interaction 
between the mechanical and the electrical state in some materials. The general 
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electrostrictive 


expansion + 


0 electric field + 


4 
7 
7 piezoelectric 


Fig. 5. Typical displacement versus voltage of piezoelectric and electrostrictive actuators 
(courtesy Physik Instrumente, www.physikinstrumente.com). 


formulation of the relative variation of material thickness, h, versus electrical field 
E or applied voltage V is therefore, 


éh/h =aE =aV/h. (7) 


For instance, lead zirconate titanate, also called PZT (PbZr,Tij_1Os3), is a 
ceramic material that shows a marked piezoelectric effect.!° It is therefore used to 
manufacture actuators for deformable mirrors.* 

It follows from Eq. (7) that a PZT actuator shows an elongation 6 proportional 
to the applied voltage V and to the number of plates N constituting the stack: 


5 = Nd33V, (8) 


where d33 is the longitudinal piezoelectric coefficient. The actual elongation is in 
fact nearly linear and shows hysteresis at a level up to 20%. 

For hard PZT materials, which show small hysteresis (down to less than 5%) 
and good linearity (less than 1% of nonlinearity), the longitudinal piezoelectric 
coefficient, d33, can be about 0.3 um/kV. This allows a peak-to-valley mechanical 
stroke in the range of 5-20 ym for applied voltages of +/—400 V, according to the 
mirror design. 

Note that the piezoelectric properties of PZT are almost insensitive to temper- 
ature within the range [-30°C; +50°C]. 


* For example, see Chapter 4 of Volume 4. 
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3.4.  Electrostrictive effect 


Electrostriction is understood as the quadratic electromechanical interaction 
between the mechanical and the electrical state in materials. The general formula- 
tion is then 


Oh/h = aE? = aV?/h?. (9) 


For instance, the lead magnesium niobate, also called PMN (PbMg, /3Nb2/303), 
is highly suitable for actuator applications. !° 

It follows from Eq. (9) that a PMN actuator shows an elongation proportional 
to the square of the applied voltage and inversely proportional to the thickness h 


of the plates 
6 = NyV?/h. (10) 


The thickness of PMN plates can be as small as 100 um, leading for instance to 
a mechanical peak-to-valley stroke of up to 10 um for a voltage range of 0-150 V. 

Note that electrostrictive properties of PMN are temperature sensitive. For 
example, the PMN is almost hysteresis-free at 20°C, while the hysteresis can be 
>10% at 0°C, with a lower sensitivity.!! 


4. Evaluation of SAM Specifications 


4.1. Strokes, mechanical coupling and actuator pitch 


The elongation of an actuator given in Sections 3.3 and 3.4 can be achieved for low 
spatial frequencies, i.e., when several actuators are driven together (for instance, 
for low order Zernike modes such as defocus, astigmatism, etc.); this is called the 
maximum stroke (0). 

This stroke cannot be obtained when an actuator is driven alone. The stroke 
delivered when driving a single actuator, which is called single stroke (69), can 
be analytically calculated considering a mechanical balancing between the actua- 
tor strength and the optical plate reaction, taking into account the eight closest 
neighbors of the considered actuator.!” 

Considering Fig. 6, and assuming that the other, farther, neighbors do not mod- 
ify the calculation (which is usually true), we have the following relations between 
strengths F; and displacements 6; for each type of actuator of the group: 


Fy = 4F, +4Fy, 11) 
Fo = (5 — 60); 12) 
F= ii, 13) 
Fp = ee 14) 
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Fig. 6. Schematic diagram of the localization of nine actuators (square layout configuration): in 
white the driven actuator (type 0), in gray its four first neighbors (type 1) and in black its four 
second neighbors (type 2). 


where E is the Young modulus (N/m7) of the actuator; S' is the surface (m2) of the 
actuator section; J is the length (m) of the actuator; 6 is the maximum stroke given 
in Sections 3.3 and 3.4. 

If we now suppose the optical plate itself to be clamped and subjected to a load 


on its center, at the circle level represented in Fig. 6, we have:!3 


69 — 61)D 
Haga eee (15) 
Pp 
where a is a specific constant linked to the architecture; p is the actuator pitch (m); 
Dgp is the optical plate flexure modulus (Nm), given by 


Eoph? 


Dep = So 
P™ 7201 —v?) 


(16) 


with Eop is the optical plate Young modulus (N/ m’); h is the optical plate thickness 
(m); v is the optical plate Poisson coefficient. 
We can now introduce the mechanical coupling factor C’, whose definition is 


by SO. (17) 


Since the considered system of actuators is symmetric, we can make the follow- 
ing assumption: 


52 = Cb, = C76q. (18) 


Finally, substituting Eqs. (17) and (18) into Eqs. (13)—(15), Eq. (11) allows us 
to obtain an analytical equation involving the coupling factor: 


4ES 4ES “De co _ oD _ 5 49) 


2 
ots (FR+S 


158 J.-C. Sinquin & H. Pagés 


Therefore, the coupling factor can easily be obtained from only the actuator 
pitch and the mechanical parameters of the actuator and optical plate: 


C= CR ee al a 


om (20) 


with 


aD 
c= — and b= ——. 
i p? 

Once this mechanical coupling factor is known, the single stroke do is deduced 
from Eqs. (12) and (15). It can also be deduced from the maximum stroke (Eq. (8) or 
(10)), considering that the maximum stroke is reached when the actuator is driven 
with its eight neighbors, which means 


} 


5 = 89 + 451 + 452 = by + 4C8o + 4C*50 $+ 80 = aa: 


(21) 

Finally, the differential stroke obtained when driving two neighboring actua- 
tors in opposite directions, which is called interactuator stroke (din), can easily be 
obtained in the following way: 


fee = 2p = = Hy —= (22) 


Figure 7 shows examples of simulated and measured single stroke and interactuator 
stroke values. 

Current SAM technologies allow mechanical coupling in the 10% to 40% range, 
which is well within the needs for correction of atmospheric perturbations. In other 
words, the factor k of Eq. (1) is minimized down to a particularly low value, 
around 0.3. 

It has to be noted that, in some cases,!? these mechanical coupling values are 
obtained for optical plate thicknesses in the range of, for instance, from 1mm to 
several mm for actuator pitches from 5mm to tens of mm. These large thicknesses 
offer significant advantages in terms of mechanical compatibility, optical quality (see 
Section 4.4), environmental compatibility and robustness. These advantages come 
from the high strengths that can be generated by the stacked actuators, which are 
in the range of tens to hundreds of N, cf. mN or uN for other technologies. 

These technologies have demonstrated good performance for deformable mir- 
rors with pitches from ~3mm to >10mm. In practice, as a rule of thumb, it can 
be considered that the evolution of the maximum stroke versus actuator pitch is 
technologically limited as indicated in Fig. 8. 

Consequently, SAM technology offers strokes and interactuator strokes well 
within the requirements discussed in Section 2. 
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(a) Simulation of single stroke obtained by FEM 
3,00 
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Measurement of single stroke obtained by interferometry 
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(c) Simulation of interactuatorstroke obtained by FEM 
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(d) Measurement of interactuatorstroke obtained by interferometry 
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Fig. 7. Example of simulated (with a finite element model — FEM) and measured results for 
single stroke ((a) and (b)) and interactuator stroke ((c) and (d)). Simulated and measured values of 
coupling factor, single stroke and interactuator stroke are in accordance with those obtained using 
the presented analytical formulation. These results have been obtained in the framework of the 
Thirty Meter Telescope (TMT) project.!+: 15 For this 5mm pitch DM prototyping, the main values 
are: maximum stroke = 15 jm peak to valley, mechanical coupling = 24%, single stroke = 6.6 um 
peak to valley and interactuator stroke = 5.2 4m. 
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Maximum stroke PV (um) 
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Fig. 8. Schematic representation of achievable values for the maximum stroke peak-to-valley 
versus actuator pitch due to technological issues. 
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Fig. 9. Schematic representation of the achievable number of actuators versus actuator pitch 
(considering a 500-mm pupil diameter). 


4.2. Number of actuators, layout and pupil dimension 


The number of actuators can easily be adapted to the need. The design can offer 
square, rectangular or triangular (i.e., hexagonal) layout. In order to obtain the 
array, the assembly can use individual actuators, lines of actuators or even sub- 
arrays of actuators. The limitation for pupil dimension is only given by the dimen- 
sion of blank materials (needed for the base plate and optical plate) and by process 
issues (for tooling and assembling). 

SAM technologies have demonstrated good performance for deformable mirrors 
showing a pupil dimension in the range of 50-200mm. Furthermore, some devel- 
opments for larger diameters have been made for ESO in the scope of the E-ELT 
adaptive secondary development!® or are currently underway for TMT.'" Even if 
SAM technologies could be pushed to even larger diameter, we can consider that the 
present possibilities for the pupil diameter are in the range of 500mm. Following 
this assumption, the maximum number of useful actuators versus actuator pitch 
can be determined as indicated in Fig. 9. 
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Looking at these numbers we see that SAM technology is well adapted to spa- 
tial orders of correction required for instrumentation in astronomy, including in 
particular those for ELTs (see Section 2). 


4.3. Resonance frequencies and temporal behavior 


The temporal behavior of a SAM is mainly determined, on the one hand, by the 
actuator behavior and, on the other hand, by the mechanical modes of the overall 
structure of the DM. 

An excellent actuator behavior model is a second-order filter (see Fig. 10). 
Considering that an actuator can be represented as a beam clamped on one end, 
the following formula is a good approximation of its first natural frequency fr:'° 


fr = 0.251-1(E/p)'/”, (23) 


Simulation of an actuator mechanical transfer function 


1,E+02 


1,E+01 


Gain 
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Measurement of an actuator mechanical transfer function 


10* 
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Fig. 10. Examples of simulated and measured transfer functions of a SAM actuator. The simula- 
tion (a) is a simple second-order filter with resonance frequency given by Eq. (23). The measure- 
ment (b) has been obtained on an actual actuator. We see a resonance frequency at ~14kHz in 
both cases. These results were obtained in the framework of the TMT project.!° 
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where | is the length (m) of the actuator; E is the Young modulus (N/m”) of the 
actuator; p is its density (kg/m?). 

The use of ceramic materials allows high actuator resonance frequency, usually 
beyond 10 kHz. In general, the transfer function is flat and the phase lag stays below 
5° within the range 0-1 kHz. 

Since most of the stiffness of a SAM comes from its base plate, the resonance 
frequencies and shapes of the fundamental mirror modes can be approached by the 
classical formulations used for square or circular plates, according to the shape of 
the mirror base plate. 

Considering, for instance, a mirror with a square base plate in free condition, 


the first. resonance frequencies are given by the following formulation:!® 


Mi | Do 
fi = ona2 7 (24) 


where \; is the specific constant of the considered mode i (A; = 14.1, A2 = 20.6, 
A3 = 23.9...) — see the example of Fig. 11(a); a is the side of the square base plate 
(m); h is the base plate thickness (m); p is the equivalent density (kg/m*) of the 
base plate, i.e., the ratio between the mass of the overall DM and the volume of the 
base plate. Dpp is the base plate flexure modulus (Nm), given by 


Epph3 


5.2 
ae Cee 


(25) 
where Ep, is the base plate Young modulus (N/ m’); v is the base plate Poisson 
coefficient. 

This approach is given as an example. In practice, the calculation has to take 
into account the boundary conditions, i.e., the way the DM is interfaced with its 
mechanical mount. In general, the design choices for the mount, including the base 
plate material, the base plate thickness, and the mechanical interface, allow high 
resonance frequency, usually beyond 1 kHz, even for large diameter DMs. This range 
is much higher than most of the frequency ranges that have to be corrected by 
the DM for applications in astronomy or other atmospheric correction cases (see 
Section 2). However, in order to confirm that no spurious effect is induced in the 
operational range, an end-to-end simulation can be useful to account for all the 
parameters of the DM (mode frequencies and damping factors) and of the AO closed 
loop (including in particular the spectrum to correct and the sampling frequency). 
In practice, no dedicated dynamic control of the SAM is required, even for large 
diameter pupils. 


4.4. Properties of optical surface 


As mentioned earlier, compared to other technologies, the optical plate is relatively 
thick (Section 4.1) and the actuators are made of ceramic material, allowing a very 
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(a) Shape and analytical calculation of the three first modes of a square plate in free condition 


Z(m) 
Z(m) 
Z(m) 


Mode 1: 1.51 kHz Mode 2: 2.20 kHz Mode 3: 2.56 kHz 


(b) Simulation of first base plates modes in free condition obtained by FEM of the overall SAM 


Mode 1: 1.46 kHz Mode 2: 2.17 kHz Mode 3: 2.62 kHz 


Fig. 11. Example of analytical results (a) compared to those obtained with a finite element model 
(FEM) of SAM modes in free condition (b) for a square base plate. Analytical results have been 
obtained with Eq. (24). Both results are close to each other in shape and in frequency, which 
confirms that the base plate is the element that gives its dynamical properties to a SAM. This 
also means that the analytical approach allows a good estimate of the DM dynamical behavior. 
These results have been obtained for the ESO SPHERE 41 x 41 deformable mirror (see Fig. 2). 


stiff mirror structure. Therefore, the polishing of the optical surface can be done 
following classical optical workshop processes. This allows a very good and stable 
optical quality, at the same level as that of a monolithic mirror. For instance, the 
following figures can be obtained: 


e Roughness: <1nm RMS mechanical. 

e Best flat wavefront error: <10nm RMS optical. This value is obtained once the 
mirror is driven to reach a flat reference surface, which may require 10% to 20% 
of the mirror stroke, according to the design (see Fig. 12). 
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Fig. 12. Example of interferometric measurement of the best flat residual error obtained on a 
SAM. Mechanical deformations of 4 nm RMS (i.e., 8nm RMS wavefront error) are reached on a 
200 mm diameter useful aperture. This result has been obtained on the deformable mirror of the 
Daniel K. Inouye Solar Telescope operated by the National Solar Observatory (NSO DKIST).19 
Figure courtesy of AOA Xinetics (used with permission). 


Regarding the coating, almost any kind of metal coating (e.g., aluminum, gold, 
or protected silver) can be deposited, in the range of 250 nm to 10 wm wavelength. In 
some cases, dielectric coatings can be used to optimize reflection in the wavelength 
range of interest.?° 


4.5. Thermomechanical behavior 


In some cases!*:?! the design choices for: 


actuator material (mainly piezoelectric), 
material of the base plate, 
material of the optical plate, and 


mechanical interface 


allow a quasi-athermalization of the DM, which means that all its specifications (e.g., 
the actuator stroke and the DM shape at rest, i.e., its shape when no control voltage 
is applied to the DM) will be quasi-constant over a large range of temperatures (see 
Fig. 13). In some other cases,?2"?8 the DM can be designed to be operational at a 
temperature far from ambient (for instance at cryogenic temperature). 
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Fig. 13. Example of local measurements of the best flat residual error (a) and stroke (b) obtained 
on a SAM at relatively low temperature. The plus signs indicate the position of the actuators. 
(a) 8 nm RMS mechanical (i.e., 16 nm RMS wavefront error) performance is obtained at —31C. 
(b) 16 um peak to valley mechanical (i.e., 32 wm peak to valley wavefront) performance following a 
“tent” pattern is obtained at the same temperature. Both results are very close to those obtained 
at ambient. These results were obtained in the framework of the TMT project.!5 Figure courtesy 
of NRC-HAA and the TMT project, used with permission. 


4.6. Thermal control 


Thanks to the ferroelectric technology, the actuators are mainly capacitors; there- 
fore, the internal thermal dissipation of a SAM is negligible. Thus, the operation of 
the DM does not generate any thermal gradients and spurious deformation of the 
optical surface. 

However, for some specific applications, such as high-power laser correction or 
solar astronomy, it is important to keep the optical surface at a given temperature. 
The design of the SAMs can be adapted in order to allow cooling possibilities. For 
instance, this can be done by the circulation of a specific heat transfer fluid beneath 
the optical surface with a thermal dissipation done by an external thermoelectric 
recirculating liquid chiller (containing heat exchanger, pump and tank) (see Fig. 14). 

Due to custom designs, the mirrors are able to show the required thermal con- 
trol performance while keeping all of their initial features. For instance, possible 
thermally induced distortion of the optical plate, coolant flow induced jitter, and 
damping effect of coolant on actuators are minimized.?4 
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Fig. 14. Two examples of thermally controlled SAMs. (a) A 52-actuator DM for laser correction 
manufactured by CILAS. (b) A 1600-actuator DM manufactured by AOA Xinetics for the NSO 
DKIST (figure courtesy of AOA Xinetics, used with permission).1° The active cooling plumbing 
is visible for both DMs. 


4.7. Reliability 


Some SAM manufacturers have demonstrated the reliability of their production,?! 
which is required, given the larger and larger number of actuators involved. 
For instance, some of the main tests are listed below. 


e The electrical breakdown test is used for the validation of the electrical reliability. 
This test consists of increasing the voltage applied to the actuators until the 
observation of a failure. The minimum required voltage value for this occurrence 
is determined by a standard used for high voltage components, which includes a 
significant safety margin. 

e The accelerated ageing test is used for the evaluation of the actuator’s lifetime. 
This kind of test is widely used in the electronics industry to measure the reliabil- 
ity of electrical components. It allows the determination of the lifetime of electrical 
components. The ageing of the components is accelerated under worst-case envi- 
ronmental conditions: higher voltage, higher humidity rate, higher temperature. 
This kind of test is carried out on breadboards during the development phase 
to compare the technologies, and also on complete actuators to evaluate their 
lifetime. 

e The mechanical fatigue test is used for the evaluation of the actuator’s mechanical 
reliability. The actuators are driven at high frequency (for instance their resonance 
frequency) under a significant stroke to accelerate their operational cycles and 
evaluate their lifetime. 


5. 
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Conclusion 


Comparing the capabilities of the stacked array mirror technologies to the presented 
requirement estimates, it is obvious that this technology is very well adapted to 


most past, present and future needs of AO in night and solar astronomical instru- 
mentation. The main interesting features are related to high spatial order with 
high interactuator stroke and high frequency specifications. They are making this 
technology the most attractive for AO applications including ELTs. 
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The Shack—Hartmann wavefront sensor has been the mainstay in astronomical 
adaptive optics systems. This chapter provides an overview of these sensors, gives 
an introduction to modeling techniques, and details algorithms used in the pro- 
cessing of Shack—Hartmann images to obtain local wavefront gradients. Finally, 
some modifications to the basic Shack—Hartmann design are described. 


1. Measurement of Wavefront Gradients 


Wavefront sensors are used to measure, or estimate, wavefronts. Generally, three 
types of measurement are possible: direct measurement, gradient measurement, and 
curvature measurement. Wavefront sensors may operate in a conjugate focal plane, 
pupil plane, or some intermediate plane. A Shack—Hartmann wavefront sensor is a 
pupil plane sensor which measures the gradient of an incident wavefront. The wave- 
front under investigation can be static (for example, due to an optical arrangement), 
or time-varying (for example, due to atmospheric turbulence). 

The history of the Shack—Hartmann wavefront sensor is well documented else- 
where, for example by Ref. 1. Astronomical adaptive optics (AO) was first demon- 
strated by the Come-on system in 1989 using a 4 x 4 Shack—Hartmann wavefront 
sensor, and almost all astronomical systems since have used Shack—Hartmann sen- 
sors, with up to 10 wavefront sensors” and up to 62 x 62 sub-apertures.? 

A Shack—Hartmann wavefront sensor divides a wavefront into a number of 
sub-apertures (or sub-pupils) in a conjugate pupil plane, and then focuses each 
sub-aperture onto a separate area of a detector, as shown in Fig. 1. The key com- 
ponents for a Shack—-Hartmann wavefront sensor include an array of small lenses, 
commonly termed a microlens array or a lenslet array, and a suitable detector, 
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Fig. 1. An overview of the Shack—Hartmann wavefront sensor concept. Collimated light at the 
microlens array is focused onto the detector. The microlens is in a conjugate pupil plane. The focal 
length of the telescope is f1 and the collimating lens, f2. 


Fig. 2. Variations of Shack—Hartmann geometries, from left to right: square, hexagonal, circular 


mask on a square grid, circular mask on a hexagonal grid, tilted square. 


most commonly a charge-coupled device (CCD) or complementary metal oxide 
semiconductor (CMOS) detector. The size of these lenslets typically varies from 
a few microns to a few millimeters, depending on application and optical design. 
The deviation of each sub-pupil point-spread function (PSF) from a known reference 
position (nominally the center of the sub-aperture if there are no optical aberra- 
tions present) is then measured to obtain the gradient of the incident wavefront 
across each sub-aperture. A wavefront reconstruction process can then be applied 
to retrieve an approximation for the incident wavefront (see Section 1.2). 

Shack—Hartmann wavefront sensor sub-apertures are most commonly square, 
to provide a good fit to typical square pixel geometries in conventional CCD and 
CMOS detectors. However, variations are possible, as shown in Fig. 2, A close- 
packed hexagonal arrangement of sub-apertures is sometimes used. A metal mask 
can be applied using a photolithography process, to yield circular sub-apertures 
within a square or hexagonal geometry, thus improving the diffraction pattern 
within the sub-aperture for applications where high accuracy is necessary, e.g., 
metrology applications (though with reduced throughput). A tilted square geome- 
try is also sometimes used to reduce the effects of diffraction on neighboring sub- 
apertures (which introduce bias). 
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The number of sub-apertures within a Shack—Hartmann wavefront sensor deter- 
mines the precision with which the incident wavefront can be reconstructed. Metrol- 
ogy applications generally require a large number of sub-apertures, with low frame 
rates and high light levels. For active optics correction, low sub-aperture counts typ- 
ically suffice at intermediate frame rates, while for AO, intermediate sub-aperture 
counts and high frame rates are necessary. Since the incident light is divided between 
sub-apertures, the available incident flux will set an upper limit on the number of 
sub-apertures that can be used: too many sub-apertures will result in low signal, 
and hence high noise in the gradient measurements. The statistics of the wavefront 
perturbations should also be taken into account, as there is little purpose in having 
a wavefront sensor that is able to measure the wavefront to a far higher spatial res- 
olution than that of the features within the wavefront. When flux is not a limiting 
factor, the lenslet diameter is typically chosen such that the major aberration across 
the microlens is a local tilt. 

The key requirements for an astronomical AO system wavefront sensor are the 
ability to operate using faint targets with high time resolution, with a broad wave- 
length bandpass, and on extended sources, including binary systems and resolved 
sources. A Shack—Hartmann wavefront sensor meets these requirements. 

A field stop is usually required for Shack-Hartman wavefront sensors operating 
on extended sources, and to reduce sky background signal. However, this is not 
essential for point sources, and is not included in many astronomical AO systems. 

A major disadvantage of Shack—Hartmann wavefront sensors is flexibility: the 
dynamic range and sensitivity is fixed, and cannot be changed during operation. 


1.1. Detectors for Shack—Hartmann wavefront sensors 


Early Shack—Hartmann systems were based on quad-cell designs with 2 x 2 sub- 
apertures per pixel, and used photo-multiplier tubes or photo-diodes for detection. 
However, advances in detector technology quickly rendered these designs obsolete, 
combined with the limitations of quad-cell Shack—Hartmann systems (namely lin- 
earity, sensitivity, alignment issues and range). Almost all astronomical AO systems 
now use CCDs, including electron-multiplying CCDs (EMCCDs), though scientific 
CMOS (sCMOS) detectors are looking increasingly interesting for these applica- 
tions due to large pixel count, low noise and high readout rates. Infrared wavefront 
sensing, using low noise electron-avalanche photodiode (eAPD) arrays, is also now 
technically feasible and investigations of these systems are commencing. 


1.2. Wavefront reconstruction 


A Shack—Hartmann wavefront sensor measures the gradient of a wavefront. There- 
fore, a wavefront reconstruction process is necessary to estimate the actual wave- 
front. For applications such as AO, where a deformable mirror (DM) is present, the 
influence of DM actuators on the wavefront gradients can be recorded (an influ- 
ence matrix), and the pseudo-inverse computed to provide a mapping matrix from 
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wavefront gradients back to DM actuators, i.e., the necessary DM shape required 
to flatten the incident wavefront. A number of other reconstruction techniques have 
also been developed, including CuReD* and FFT-based techniques.®> When multiple 
wavefront sensors are present, tomography can be performed, i.e., the full turbulent 
volume can be reconstructed. Further information is given in Chapter 11. 


1.3. Manufacture of lenslet arrays 


There are a number of modern techniques that can be used for fabrication, including 
micro-machined molds and a cured epoxy, additive manufacturing and standard 
semiconductor and integrated circuit techniques including photolithography, resist 
processing, reflow and ion etching. 

It is possible to fabricate a wide range of microlens arrays, including aspheric 
and diffractive profiles, with sub-micron tolerances. Materials used include fused 
silica, many types of glass (including BK7), plastics, and cured epoxy. Anti-reflection 
coatings are also commonly added. 


1.4. Optical alignment of Shack—Hartmann systems 


Shack—Hartmann system designs are usually telecentric, with a collimated input to 
help reduce field curvature. Lenslet array pitch is typically small (less than 1 mm) 
and so high alignment tolerances are required, often necessitating alignment with 
micron-level accuracy. Figure 1 shows a typical optical design for a Shack—Hartmann 
wavefront sensor, as could be used in an astronomical AO system. The light incident 
on the lenslet array is collimated, and the lenslet array is in a conjugate pupil plane. 
To match detector geometry, re-imaging optics can be used (not shown), though this 
reduces throughput due to the presence of additional optical surfaces. 

The typical alignment procedure for a Shack—Hartmann system commences 
by collimation of the input beam. The conjugate pupil plane is then identified by 
obtaining a sharp image of an object placed at the negative focus of the pupil plane, 
and the lenslet array is installed at this location. Finally, the Shack—Hartmann spots 
are focused on the detector by adjusting the detector position. While performing this 
operation there are several things to note. If a re-imaging system is used between the 
detector and lenslet array (typically a pair of lenses), care must be taken to ensure 
that the spots, rather than the lenslet array itself (which, being a periodic pattern, 
can look surprisingly like Shack—-Hartmann spots when imaged by a detector), are 
focused onto the detector. Additionally, Fresnel diffraction effects must be taken 
into account to ensure that the periodicity of the lenslet array is not mistakenly 
re-imaged (the Talbot effect®). 

Matching of spots to sub-apertures can be difficult in the presence of unknown 
wavefront tilt. To overcome this problem, a reference can be provided, e.g., a pupil 
stop, central obscuration or optical support structures (spiders). For metrology 
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applications, a sub-aperture mask can be applied, either blocking a central sub- 
aperture or using a pattern of reduced flux sub-apertures (which can also over- 
come rotational symmetry uncertainties). This approach also allows highly distorted 
wavefronts to be measured by enabling identification of individual sub-aperture 
spots. 


2. Modeling and Simulation of Shack—Hartmann 
Wavefront Sensors 


Simulation of Shack—Hartmann wavefront sensors is necessary when evaluating the 
performance of AO systems and to aid design decisions. There are several techniques 
that can be used, depending on required fidelity. Some models aim to replicate 
the output from the wavefront sensor detector, while others take a further step 
and provide the incident wavefront gradient measurements. Monte Carlo, statistical 
modeling and ray tracing techniques can be used. 

When the input wavefront and noise can be represented statistically, a statistical 
analysis of the wavefront sensor output can be performed. Higher fidelity modeling 
requires Monte Carlo simulation, modeling the wavefront sensor output as a function 
of time (including time-varying noise). 


2.1. Simple Monte Carlo modeling 


The simplest Monte Carlo model of a Shack—Hartmann wavefront sensor involves 
taking the phase across each sub-aperture and computing the mean tilt of this phase 
in two perpendicular directions. These measurements are then used as the wavefront 
gradient estimates. Gaussian noise can be added to simulate photon shot noise and 
detector readout noise. 

These models are computationally efficient but do not consider any second- 
order effects, including detector pixelization, the impact of extended sources and 
changes in signal-to-noise ratio due to seeing variations. 


2.2. Physical optics models 


Physical optics models are most commonly used when modeling Shack—Hartmann 
wavefront sensors in time-series simulations. A Monte Carlo approach is used, with 
different random inputs at each time step (e.g., perturbed wavefront phase, detector 
readout noise). These models include effects such as diffraction, detector pixeliza- 
tion, noise and image plate scales, and usually assume a far-field approximation so 
that Fourier propagation can be used (i.e., the pupil plane is transported directly to 
the focal plane). Such models can be mono- or poly-chromatic, and can be narrow 
or wide field of view (e.g., laser guide stars or the solar surface). Physical optics 
models delivering high fidelity images are computationally demanding. Functional 
pseudo-Python code for the generation of these images is given below, requiring an 
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input turbulent wavefront phase map (in radians), a pupil function, flux (photons 
per sub-aperture per frame), readout noise and the number of sub-apertures in one 
dimension. 


def MakeShackHartmannImage (phase , pupilFunction, flux, readnoise,n_subaps) : 


npup=phase. shape [0] #Assume phase is a square 2D array. 
pitch=npup/n_subaps 

shs=numpy .zeros(phase.shape,"f") #Create the output array 
tilt=-(numpy.mgrid[:pitch, :pitch] .sum(0)+1-pitch) *numpy.pi*(2*pitch+1) /(2*pitch) 
for y in range(n_subaps) : #Iterate over the sub-apertures 


for x in range(n_subaps) : 

#Select phase for this sub-aperture 
subapPhase=phase [y*pitch: (y+1)*pitch,x*pitch: (x+1)*pitch] 
#Add a tilt to center the spot on the 4 central pixels 
subapPhase=subapPhasettilt 
#Select the pupil for this sub-aperture (to allow for partial vignetting) 
subapPupil=pupilFunction[y*pitch: (y+1)*pitch,x*pitch: (x+1)*pitch] 
#Create the complex amplitude 
complexAmp=subapPupil* (numpy . cos (subapPhase) +1j*numpy.sin(subapPhase) ) 
#Perform a zero-padded 2D FFT 
focus=numpy.fft.fft2(complexAmp, (pitch*2,pitch*2) ) 
#Create the spot image 
image=(focus*focus.conjugate()).real 
#Bin to create a high light level (noiseless) image 
image=numpy.reshape(image, (pitch,2,pitch,2)).sum(3).sum(1) 
#Scale for flux 
image=image*flux*subapPupil.sum() /(image.sum()*pitch*pitch) 
#Add readout noise and shot noise 
image=numpy . random. poisson (image) +numpy.random.normal(0,readnoise, image. shape) 
shs[y*pitch: (y+1)*pitch,x*pitch: (x+1)*pitch]=image 

return shs 


When using this model, the size of the FFT should be at least twice as large 
as the size of the sub-aperture input phase array to avoid aliasing. Binning of the 
FFT output is necessary to avoid unphysical spatial resolution. 

A drawback of this model is that if the wavefront tilts are large, spots will wrap 
around back into their own sub-apertures. Provided that the sub-aperture field of 
view is large enough to cope with expected wavefront phase, this will have little 
effect on modeling results. However, two methods can be used to avoid this effect: 
the FFT can be oversampled and then the individual sub-aperture images allowed 
to overlap; alternatively, the bulk of wavefront tilt can first be removed (equal to a 
known shift of a whole number of detector pixels), the noiseless image computed, 
and then replaced within the detector array at an offset equal to the tilt removed. 
The former method can require significant extra computation if input wavefronts 
are highly aberrated (and image wrapping can still occur), while the latter method 
requires additional calibration for partially vignetted sub-apertures. 

The pixel scale (plate scale) of a Shack-Hartmann wavefront sensor can have 
a significant effect on performance. Therefore, during modeling it is necessary to 
know and adjust the pixel scale to match input requirements. In the case of a 
Fourier propagation model, the pixel scale is dependent on wavelength A (in m), 
sub-aperture diameter D (in m, as projected onto the system input aperture), the 
number of phase evaluation points across the sub-aperture, n, the size of the FFT 
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used, F’, and the degree of pixel binning into the final image, b. The modeled pixel 


. tad te gt _ Anb 360 
scale, P, in arcseconds per pixel is given by P = FF x 3600 x —. 


2.2.1. Temporal and chromatic modeling 


Smearing of Shack—Hartmann wavefront sensor images can occur if the detector inte- 
gration time is longer than the relevant characteristic phase perturbation timescales. 
This effect can be modeled if necessary by averaging multiple noiseless images over 
several time-steps. Noise sources (photon shot and readout noise) can then be added 
at the end. 

In a similar way, chromatic effects can be modeled by weighted averaging of 
noiseless Shack—Hartmann images created at a range of wavelengths, to simulate an 
effective bandpass. 


2.2.2. Fresnel propagation 


Fresnel propagation models are appropriate when highest image fidelity is necessary, 
or when modeling of optical components between pupil and focal planes is required. 
Shack—Hartmann modeling using Fresnel propagation is computationally expensive, 
and is most common in optical design packages (e.g., Zemax) and software libraries. 


2.3. Ray tracing and geometrical optics 


Geometrical optics approximations with ray tracing techniques are commonly used 
by optical design software. Here, many rays are traced through the optical surfaces, 
including the Shack—Hartmann lenslet array, to a defined plane (typically the detec- 
tor plane) where the sampled density of rays is used to infer the incident intensity 
pattern. This approach is typically used for the optical design of an AO system, but 
not for modeling of AO performance. 


3. Processing of Shack—Hartmann Images 


A Shack—Hartmann system requires a two-fold calibration procedure. First, image 
calibration is required. The calibrated image is then used to compute the wave- 
front gradient measurements, which must themselves be calibrated relative to a pre- 
defined reference position. This reference position depends on the optical alignment 
of the system and non-common-path errors. The effective gain of the system also 
requires calibration (mapping a spot motion measured in pixels to a wavefront tilt 
measured in radians), since this can depend on atmospheric turbulence conditions, 
particularly for quad-cell systems. 

A Shack—Hartmann sensor measurement will be affected by random errors (for 
example, from photon shot noise and detector readout noise) and by bias errors (for 
example, from optical misalignment). 

When a point source is imaged by the wavefront sensor, the Shack—Hartmann 
images will consist of many individual point spread functions, as shown in Fig. 3(a) 
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Fig. 3. (a) A typical Shack—Hartmann pattern for an unresolved point source. (b) A Shack— 
Hartmann pattern resulting from an extended laser guide star launched off-axis. 


commonly called Shack—-Hartmann spots, or a spot pattern. If extended sources 
are imaged (for example, the solar surface or a laser guide star), then the Shack- 
Hartmann images will also be extended, as shown in Fig. 3(b). 


3.1. Computational hardware 


Processing of Shack-Hartmann images requires computational hardware. Histori- 
cally, digital signal processors (DSPs) and field-programmable gate arrays (FPGAs) 
have been used” ® within an AO system. However, within the last decade, the 
computational power within conventional CPUs has become sufficient to perform 
real-time processing of Shack—Hartmann images, as demonstrated, for example, by 
CANARY® using the Durham AO Real-time Controller (DARC) system.19:!! As 
telescope sizes increase, computational demands rise rapidly, and so future ELT 
systems will require many-core processors. 


3.2. Center of gravity estimation 


A center-of-gravity algorithm is the most commonly used algorithm for astronomical 
AO systems, obtaining the center of gravity of each Shack—Hartmann spot image, 
as demonstrated by the pseudo-Python code below, where the input image is a 2D 
array. 


def CoG(subapImage) : 

s=subapImage.sum() 

if s==0: 
cx=cy=0.0 

else: 
cx=(subapImage.sum(0)*numpy.arange(subapImage.shape[1])).sum()/s 
cy=(subapImage.sum(1)*numpy.arange(subapImage.shape[0])).sum()/s 

return cx,cy 
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A center-of-gravity algorithm can deliver poor performance when the signal-to- 
noise ratio is low, and is not ideal for extended sources; it is usable for laser guide 
star images, while for solar surface images the center of gravity is meaningless. 


3.3. Weighted center-of-gravity 


The weighted center-of-gravity algorithm applies a per-pixel weighting to the sub- 
aperture images. This allows more importance to be placed on pixels at the center 
of the sub-aperture, where the spot is likely to be, and reduces the influence of 
pixels containing only noise. The weighting function can either be data-generated 
(e.g., a long exposure image), or model-based (for example a 2D Gaussian pattern). 

This nonlinear algorithm is only appropriate if the Shack—Hartmann images do 
not move far from their central position, i.e., for closed-loop AO systems and rea- 
sonable seeing conditions. An iterative technique can also be used, first determining 
the approximate spot location, and then applying the weighting function around 
this. 


3.4. Correlation-based wavefront sensing 


When the Shack—Hartmann images have an extended structure, it can be appropri- 
ate to use a correlation-based algorithm. Here, the Shack—Hartmann images are con- 
volved with a reference image that has a similar shape and structure. The correlation 
peak is then used to estimate the wavefront gradient. Figure 4 shows a calibrated 
sub-aperture image, a reference image, and the correlation of these images. In this 
case, the signal within the correlation is clear above the noise and well defined, 
yielding a more accurate gradient measurement. 

The choice of the correlation reference images depends on the application in 
question. For solar AO systems, a single sub-aperture image (from a single image 


frame) is typically used for all sub-apertures, and is updated frequently, e.g., every 
5-10 s. When this update is made, a global wavefront tilt error will be introduced. 


Fig. 4. (a) A calibrated laser guide star sub-aperture image. (b) A reference image obtained by 
shifting and co-adding the sub-apertures from 100 frames of data. (c) The correlation map between 
the reference image and the sub-aperture image after thresholding to remove noise. 
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However, this can be mitigated by either simultaneously modifying the reference 
slope measurements or by updating the reference images in-between science image 
exposures (so that the shift in location on the science imager occurs in-between 
consecutive image frames).* 

For astronomical AO systems, the most common extended structure source is 
a laser guide star, as viewed by sub-apertures away from the laser launch aperture 
(Fig. 4(a)). The structure within the Shack—Hartmann images is therefore different 
for each sub-aperture. The correlation reference images are therefore independent 
for each sub-aperture, and will typically be generated by averaging many consecutive 
frames of wavefront sensor images (about one second of data seems to be a good 
compromise!”). Alternatively, theoretically generated reference images (for example, 
Gaussian shapes) can also be used. 


3.4.1. Correlation reference update 


For astronomical systems, when the reference images are updated, a modification to 
reference slopes is essential, since each sub-aperture reference image will introduce a 
different wavefront gradient bias (unless centered and symmetric). A simultaneous 
reference slope modification must therefore be computed.!? 

Modification of reference images (and hence reference slopes) is advisable at a 
rate greater than the expected rate of change of the structure within the Shack— 
Hartmann images. For Rayleigh laser guide stars, this can be infrequent and is 
typically only necessary when guide star altitude or depth is altered (and this is 
under operator control). For Sodium laser guide stars, reference updates are required 
on time-scales similar to those of the evolution of the sodium layer structure, which 
can be as frequent as every few seconds.!® The real-time control system used by the 
CANARY instrument, DARC,!°:'! has demonstrated the ability to update reference 
images and corresponding reference slopes every frame using a rolling average of 
image frames, thereby maintaining an optimum reference image set. Additionally, 
this system can also apply a partial update to reduce computational demands, 
updating only a subset of sub-apertures every frame. 


3.5. Matched filtering of Shack—Hartmann images 


A matched filter algorithm can also be used when Shack—Hartmann images con- 
tain extended structure, and has been proposed for use with the Thirty Meter 
Telescope (TMT). The first on-sky demonstration with the CANARY AO system 
in 2016'° showed improved closed-loop AO performance compared with center-of- 
gravity estimation. A matched filter represents an optimal way of minimizing the 
impact of noise within a signal, providing a noise-weighted least squares estimate 
of the Shack—Hartmann spot locations. 


*See Chapter 19 for further details. 
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The matched filter algorithm has the ability to take into account the readout 
noise of each detector pixel, meaning that it is well suited for use with CMOS detec- 
tors (including sCMOS), where rms readout noise is different for each pixel, which 
can otherwise have an adverse effect on spot position estimation.!® Full derivation 
of the matched filter matrix is given by Ref. 14, with a technique to extend the 
linearity of this method developed by Ref. 17. 

Computationally, within a real-time control system, the matched filter algo- 
rithm is on par with a center-of-gravity estimate and is therefore highly attractive 
when computation resources are limited. However, online generation of the matched 
filter requires additional resources (depending on whether continuous or batch gener- 
ation is used). It should be noted that a correlation algorithm demands significantly 
more computational resources than the matched filter technique. 

A matched filter algorithm is not restricted to application on sub-aperture 
images, and can also be applied to correlation images (i.e., to the result of convolving 
a sub-aperture image with a correlation reference). 


3.6. Image calibration algorithms 


In addition to standard image calibration algorithms, such as bias removal, applica- 
tion of flat field images, background subtraction and thresholding, Shack—Hartmann 
wavefront sensor images can also be calibrated on a per-sub-aperture basis to 
improve wavefront gradient estimation accuracy. 

Since the expected sizes of Shack—Hartmann spots are usually known (at least 
when using natural and laser guide stars), it can be advantageous to calibrate the 
spot image by selecting only a number of pixels equal to the expected spot size, i.e., 
the n pixels which contain the most flux. This can be achieved by first sorting the 
pixels within a sub-aperture by signal, and then selecting the n+ 1th brightest pixel 
to use as the threshold level. This value is subtracted from all pixels within the sub- 
aperture, and those with a negative value are then set to zero. This “brightest pixel 
selection algorithm” !® has been demonstrated to improve on-sky AO performance 
by the CANARY instrument. However, performance is somewhat sensitive to the 
number of brightest pixels chosen, so care must be taken. Further improvement can 
be obtained by using the mean of the signal in the n + 1 to n +m brightest pixels 
as the threshold level (m being an integer) to better average random readout noise. 

An alternative approach for laser guide stars is to use, “radial thresholding,” 
where the threshold is lowered as elongation increases.!° 

When using faint guide stars, the signal-to-noise ratio within a sub-aperture is 
often low due to the presence of photon shot noise and detector readout noise. Pro- 
cessing sub-aperture images to reduce the total variance within them has been shown 
to lead to an improvement in Shack—Hartmann spot position estimation at low light 
levels,?° using a technique called “total variation minimization.” This technique has 
the potential to enable a gain of up to one magnitude in guide star brightness and has 
been demonstrated on-sky with the CANARY AO system. This image processing 
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technique is most appropriate when sub-aperture sizes are large, for example for 
open-loop AO systems. 


3.7. Adaptive windowing and spot tracking 


Adaptive sub-aperture windowing, or spot tracking, is another processing technique 
that can improve Shack—Hartmann spot position estimation. By measuring the posi- 
tion of a spot within its sub-aperture in the previous image frame, the location of the 
sub-aperture can be adjusted (typically with a gain factor of 0.5 to avoid overshoot) 
so that the spot is centrally located. In situations where spot motion can be large 
(i.e., can move by more than one pixel), the sub-aperture size can be reduced to 
slightly larger than the size of the spot itself (once the AO loop is engaged), rather 
than having to be at least as large as the range of expected motion. This technique is 
used with the CANARY instrument. However there are a number of considerations 
that should be taken to ensure robust operation. 

The distance over which a sub-aperture is allowed to move should always be 
restricted to prevent neighboring sub-apertures from latching onto the same Shack— 
Hartmann spot (due to the presence of noise, or temporary loss of signal). Sub- 
aperture positions should also not be allowed to move off the detector sensitive 
area. 

When the sky background is changing, e.g., close to dawn or dusk, an uncal- 
ibrated flux gradient will be present across each sub-aperture, which will bias the 
spot position. If the adaptive window algorithm loses the spot position (for exam- 
ple due to an instantaneous high wavefront curvature, or temporary reduction of 
incident flux), then the adaptive window will follow the background gradient until 
it reaches the limit of its allowed range, where it will then remain. Therefore, the 
AO system must be capable of automatically resetting stuck sub-apertures if they 
have not moved from their extremity for a set number of image frames. 

If sub-aperture images are weighted away from their reference positions (e.g., 
laser guide star spots with a bottom-heavy sodium layer profile), then a center- 
of-gravity estimate used for adaptive window location updates (which should be 
applied before reference slope subtraction) will result in the spot not being located 
centrally within the sub-aperture. Therefore, it must be possible in these situations 
to specify an offset with which the adaptive window position is modified. 


3.8. Arbitrary sub-aperture shape 


In astronomical AO systems Shack—Hartmann geometries are most commonly 
square or hexagonal, and therefore the Shack—Hartmann images follow this geom- 
etry. However, the Shack—Hartmann spots themselves are typically not this shape, 
being either circular (for natural guide stars) or elongated in one direction (for laser 
guide stars). When a spot is centered within a sub-aperture (for example if adap- 
tive windowing is used), it can therefore be advantageous to define a sub-aperture 
mask that has a shape equal to that of the spot extended equally in all directions. 
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This therefore allows pixels containing only noise to be masked out during process- 
ing, thus improving spot position estimation and reducing the impact of overlapping 
sub-apertures. 


3.9. Treatment of telescope support structures 


Partial vignetting or full obscuration of sub-apertures due to telescope support 
structures (spiders) can lead to an uncertainty in wavefront reconstruction: a reduc- 
tion in incident flux will lead to a reduction in signal-to-noise ratio. 

If the telescope support structure leads to a line of fully obscured sub-apertures, 
then islands of reconstructed wavefront phase will result, unconnected to other areas 
of the telescope pupil, since there is no knowledge of the incident wavefront phase 
behind these support structures.?! A statistical analysis of the resulting low-order 
mode uncertainties can largely mitigate this effect. 

If the telescope pupil as seen by the wavefront sensors is rotating with respect 
to support structures, then a time-dependent vignetting of sub-apertures will occur. 
This will result in the requirement for an ongoing calibration of wavefront sensor 
reference slope measurements to ensure that good AO performance is maintained. 


4. Modified Shack—Hartmann Sensor Concepts 


Several modifications to the conventional Shack—Hartmann designs have been pro- 
posed to overcome different shortcomings of these wavefront sensors, and some of 
these concepts are scheduled for inclusion on forthcoming AO systems. 


The LIFTed Shack—Hartmann sensor: Introducing an astigmatism in each 
lenslet of the microlens array enables estimation of local low-order modes in each 
sub-aperture, increasing effective wavefront sensor order.?? 


The optically binned Shack—Hartmann sensor: This method?* involves split- 
ting the incoming light with a beam splitter, rotating one arm, and focusing in 
one dimension using a lenticular array, and in the other using a cylindrical lens. 
The signal for each sub-aperture is then represented by a single row of pixels for 
each Cartesian direction, rather than a two-dimensional image, greatly reducing 
the number of pixels required (and hence readout noise). For example, a Shack— 
Hartmann spot requiring 8 x 8 pixels is reduced to a requirement of 2 x 8 pixels. 
This technique is more difficult to align optically, and is also best applied to natural 
guide star (rather than laser guide star) images. 


Polar coordinate CCD Shack—Hartmann sensor: The Thirty Meter Telescope 
project?* is proposing to use a custom, novel CCD detector for laser guide star 
wavefront sensing (Fig. 5(a)). The laser guide stars will be launched from an on-axis 
location, and conventional Shack—Hartmann optical components will be used. The 
laser guide star elongation pattern will be radial, and therefore can be matched with 
a unique CCD pixel geometry with pixels for each sub-aperture aligned parallel and 
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Fig. 5. Some Shack—Hartmann concepts to help overcome laser guide star elongation effects. 
(a) A radial CCD concept, as proposed for use with the TMT NFIRAOS AO system. Here, 8 x 8 
sub-apertures are shown with a centrally launched LGS. Rather than using the same number of 
pixels for each sub-aperture on a Cartesian grid, detector pixels are aligned to match the elonga- 
tion direction, with sub-apertures increasing in length with distance from the launch aperture. 
(b) A variable pupil sampling lenslet array with sub-apertures growing away from the laser 
launch location (the circle in bottom left corner). Within the sub-apertures, ellipses represent 
the elongated LGSs. 


perpendicular to the elongation direction, with sub-aperture size increasing with 
distance from the launch aperture, reducing total readout noise. 


Variable pupil sampling Shack—Hartmann sensor: To overcome some issues 
arising from laser guide star spot elongation on ELTs, E. Gendron of Paris Obser- 
vatory has proposed a variable pupil sampling Shack—Hartmann sensor (Fig. 5(b)) 
for off-axis laser launch. Telescope pupil sampling decreases with distance from the 
laser launch axis, while per-sub-aperture flux increases, thus countering the effect 
of elongation (which spreads light over more pixels). This technique is particularly 
well suited to multiple laser guide star AO systems, where poor pupil sampling in 
one wavefront sensor can be compensated by good pupil sampling in another. 


5. Conclusions 


I have discussed Shack—Hartmann wavefront sensors in some detail, including dis- 
cussions of calibration issues, analysis algorithms, and novel lenslet and detector 
designs optimized for this specific application. For now, Shack—Hartmann wavefront 
sensors remain the mainstay for astronomical adaptive optics, though the popularity 
of Pyramid sensors has increased in recent years. Detailed discussions of the AO 
systems that utilize these sensors can be found in Volume 3 of this Handbook. 
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Curvature sensing is a very simple and elegant method to measure the Laplacian 
of the wavefront, based on the conservation of flux irradiance. Its implementation 
does not require any special optical components and as it is a direct measure of 
intensity in defocused images, photodiodes or other read-noise-less detectors can 
be used. Because it is a differential measurement, it is insensitive to biases and 
internal errors. Coupled with bimorph mirrors, it provides the most efficient type 
of adaptive optics and is perfectly suited to the spatial spectrum of atmospheric 
turbulence. Nonetheless there are some drawbacks to curvature wavefront sensing, 
most notably due to spatial aliasing and dynamic range. 


1. Out-of-Focus Images 


Amateur astronomers defocus their telescopes to collimate them! and traditionally, 
physicists and optical scientists have used out-of-focus images to assess the aberra- 
tions of an optical system. As shown in Fig. 1, the out-of-focus images for a circular 
pupil should be round and evenly illuminated. Deviation to either the shape or 
the illumination are due to phase aberrations. In fact it is possible to identify and 
estimate aberrations using defocused images because each Zernike aberration pro- 
duces a well identified pattern when viewed out of focus. For example, astigmatism 
elongates the defocused image symmetrically,? while coma produces an asymmetric 
elongation® as shown in Fig. 2. Francois Roddier developed and formalized this 
primarily qualitative idea using the irradiance transport equation in 19884 and laid 
the foundations for curvature wavefront sensing. 
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Fig. 1. A geometrical optics diagram showing the principle of curvature adaptive optics. A region 
of the wavefront with some curvature will focus ahead of the nominal focus. As it propagates, the 
phase aberration transforms into an intensity variation. In the out-of-focus planes, the intensity is 
proportional to the local curvature of the wavefront. The aberrations will focus closer to the pupil 
for a larger blur angle 99 (see Sections 3.1 and 4.1 for details). 


Fig. 2. The intensity distribution for various simple aberrations from pupil plane to focus. 
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Because it is neither feasible nor practical to record the phase of electromagnetic 
waves at optical wavelengths, all the wavefront sensing methods need to transform 
a phase object into an intensity object, which can then be measured. This is either 
done by interferometric means or by using an optical element (such as a lenslet array 
or a pyramid) that transforms a function of the phase (usually some derivative) into 
an intensity function. Curvature sensing is based on the fact that where phase is 
introduced it has no effect on the intensity, but as the beam propagates, the phase 
deforms the beam and naturally reveals itself as intensity fluctuations: where the 
beam converges, the intensity locally increases and conversely, beam divergence 
introduces a decrease in the local illumination. This behavior is described by the 


irradiance transport equation.>* 


2. Conservation of Flux 


Curvature wavefront sensing is based on the law of conservation of flux, which 
can be described in most general terms by the time-invariant irradiance transport 
equation. It simply states that the light energy is conserved as the beam propagates 
along its optical axis. Consider the wave w(2, y, z) = [\/I(a, y, z) exp(—24(a, y, z))], 
where I(x,y,z) is the intensity distribution along the beam, and ¢(2,y,z) is the 
phase. We can write the wavefront W(a,y,z) in terms of the phase ¢(2,y,z) = 
(27/A)W (x,y, z) = kW (a2, y, z) with k the wavenumber. The irradiance transport 
equation can be written as 


OP -V(I- VW) 
Oz 
=-(VI-VW+1IV?W), (1) 


where V is the 3D derivative operator i0/Ax + j0/dy + kO/Oz, but let us restrict 
the beam propagation to the z-axis such that kz,ky < kz, so that we can simplify 
V as i0/dx + j0/dy. Thus, V? is equal to 02/dx? + 0?/dy? and is the Laplacian 
operator, tracing the curvature of the wavefront. Let us define the transmission 
function P(a,y) of the pupil of the optical system; P(a,y) equals one inside the 
pupil and zero outside. The illumination is uniform, equal to Jp, inside the pupil. 
Therefore the gradient of the intensity Vis equal to zero everywhere except at the 
pupil’s edge, where it is given by 


VI = —Ipitbc, (2) 


where 6, is a linear Dirac function on the edge of the pupil, and 7 is a unit vec- 
tor around the edge of the pupil (and pointing outwards). We can then rewrite 
Eq. (1) as 


Ol xX Op 


sO 40%), (3) 
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because the dot product of 7 and the azimuthal derivative of the wavefront is null. 
Finally, we can make the following simplifying assumption: 


101 1 or 
IT0z Ip 6z 
2 (Lz, =i: 
ee se) (4) 


(21 — 22) (Tz, + iy 


where Io = (I,,+J,,)/2. To further simplify the notation, let us assume that the two 
out-of-focus analysis planes are symmetrical around the pupil plane (z1 = —z2); we 
call the distance from focus to these planes the defocus length J, such that z = f—l. 
Lastly, we define a position vector p on the out-of-focus image, similar to r but scaled 
to the out-of-focus image p = r x f/l. Everything is in place to finally write down 
the curvature equation. 


3. Curvature Equation 


The curvature equation is 


A (f=) [oe 
Ti(r)t+h(-r) 2 1 Iv $(p) 3 ue (5) 


Let us interpret this equation. The left-hand side is the out-of-focus contrast, 
a normalized difference of intensity distributions in planes symmetrical around the 
focus; this is our measurement and it is equal to, on the right-hand side, the Lapla- 
cian of the phase, i.e., its curvature (this is sometimes called the lens term). A 
differential equation of the form UV = V7¢@ is called a Poisson equation. On the 
right-hand side, there is also the radial derivative at the edge of the pupil (some- 
times called the prism term), which provides the boundary conditions necessary to 
solve this differential equation; this is called a von Neumann boundary condition 
in that the constants of integration are specified as a derivative around the inte- 
gration domain, as opposed to Dirichlet boundary conditions (that are in the form 
of the value of the function at the boundary to be integrated). Numerical tech- 
niques exist to solve Poisson equations with von Neumann boundary conditions, 
but in the case of curvature adaptive optics, this is not necessary as we will see in 
Section 5. The prism term allows the curvature sensor to be sensitive to Zernike 
polynomials that may have a zero curvature (such as tip, tilt or even astigmatism 
where 07¢/0x? = —0?@/Oy? everywhere on the pupil), but that displace or distort 
the out-of-focus beam, which is a consequence of a non-zero divergence (in the 
mathematical sense) of the intensity, VJ, as shown on the third (and fifth) row of 
Fig. 2. 


Curvature Wavefront Sensing 191 


3.1. Optical gain 


We note that the defocus length | appears in the denominator of the constants on 
the right-hand side. This acts as an optical gain: the closer the analysis planes are 
to the focus, the stronger the measured signal for a given curvature. This can be 
understood physically by considering what happens to a phase defect of a given 
amplitude depending on whether it is very localized in the pupil (a high spatial 
frequency) or whether it extends over the entire pupil (low spatial frequencies), 
as shown in Fig. 3. Local defects have a large blur angle, 09 (Fig. 1) and will 
converge very rapidly (close to the pupil). By the time the light reaches focus, this 
light will mostly be scattered across the focal plane. A broad aberration (of the 
same amplitude) will have a small blur angle and there will be barely any intensity 
variation close to the pupil (large |). However, the shape of the PSF will be grossly 
modified, so sensitivity will be highest at small defocus lengths. The curvature 
sensor can thus be tuned to be sensitive to different spatial scales as well as for 


oa 


’ y \ \ 


Fig. 3. The sensitivity of a curvature sensor as a function of its optical gain. Bottom left panel: 
The phase (left image) is fed through the optical system. The central and right-hand images of 
the bottom left panel show, respectively, the Laplacian at full scale and with a stretched scale 
to enhance the contrast. Note that the phase is composed of low spatial frequencies as well as 
a square of high spatial frequencies close to the center. Top panel: Schematic diagram showing 
seven positions at which the intensity distribution is computed and displayed in the middle panel, 
as indicated by the arrows. These positions represent the situation close to the pupil (left and 
right extreme images), in the linear range, close to the focus, and at the focus (central image). 
Middle panel: Intensity distributions at the indicated positions. Bottom right panel: the contrast is 
displayed for three of the images in the central panel. Proximity to the pupil improves sensitivity 
to high spatial frequencies at the expense of low spatial frequencies. Too close to the focus and non- 
linearities due to the pupil’s edge diffraction are introduced. Thus, the linear range (and optimal 
optical gain — middle of bottom right panel) depends on the defocus length. 
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gain: putting the measurement planes far from focus increases the sensitivity to 
higher spatial frequencies at the expense of the low ones, while a high gain to 
low spatial frequencies can be achieved close to focus, at the expense of sensitivity 
to high spatial frequencies. Working close to focus means that the dynamic range 
is limited and measurements become nonlinear when the amplitude of the phase 
defects becomes too large. This nonlinearity is due to diffraction effects that are not 
included in the Curvature equation. The curvature sensor works well around its zero 
and is an ideal closed loop sensor, but in practice, an optimal defocus length needs to 
be determined, to maximize the sensitivity to the residual phase aberrations while 
remaining in the linear regime. In its original inception, it was conceived that the 
optical gain could gradually be increased as the adaptive optics loop closed and the 
phase excursions amplitude decreased. This has never been needed or implemented 
thanks to the stability of closed loop systems. 


4. Development and Implementations 


The curvature wavefront sensor was first implemented on the UH-AO system by 
Francois Roddier and collaborators in 1991.° It was a 13-element system that imple- 
mented a modulation of the defocus length using a vibrating membrane mirror at 
a focal plane. When the membrane is flat, the pupil is re-imaged on the lenslet 
array, and as it vibrates and changes its radius of curvature, it re-images out-of- 
focus planes conjugated to different distance from the pupil (= f —/) at constant 
magnification on the lenslet array. Thus the stronger the amplitude, the smaller 
! is and the stronger the optical gain. Each lenslet was coupled to an avalanche 
photdiode in photon-counting mode, and synchronous detection with the membrane 
mirror was used to integrate counts in the intra- and extra-focal images. An elegant 
design feature of this implementation is that the measurement becomes differential 
and is thus insensitive to defects and biases within the sensor (e.g., variable trans- 
mission or fiber-APD coupling). At the same time, this design allows the defocus 
length to be modulated and optimized without altering any optical alignments or 
components. Most (if not all) operational curvature adaptive optics systems have 
used this focal plane vibrating membrane mirror design. Because such mirrors are 
acoustically driven by means of an acoustic loudspeaker, curvature sensors emit a 
characteristic high-pitched pure sine wave sound. 


4.1. Defocus length 


The defocus distance has to be chosen so that the geometrical approximation of 
Eq. (5) remains valid. Thus, / has to be chosen such that the diffraction or speckle 
blur produced at the position of the defocused image remains small with respect 
to the wavefront fluctuations we want to measure, in other words, the size of the 
sub-apertures, d. At the defocus plane, the size of the sub-aperture is Id/f, and the 
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blur size will be (f — 1)@0, so must satisfy the following condition: 


(f —1)00 < Id/f, (6) 


f 
re 7 
= T+ d/(F00) 4 
For large blur angles, / tends to f (ie., close to the pupil), but when 6 is small 
compared to the f-ratio of the subapertures beams, d/f, the expression for the 
defocus length can be simplified to 


2 
1> O07, (8) 


If the sub-aperture size d is larger than ro, then the blur angle is given by \/ro such 
that | > Af?/(rod). This is the behavior expected for a low-order system. However, 
as the loop closes the effective ro increases and the diffraction of the lenslets (of size 
A/d) becomes dominant. The sensor can bootstrap on its own correction and the 
defocus length can be reduced to 1 > Af?/d?. This condition is very close to the 
near field approximation.* so we conclude that the optimal defocus length occurs 
at the shortest distance from focus where diffraction effects do not degrade the 
measurements. 

Another possible implementation of the curvature sensor, if used at fixed defocus 
length (so possibly more apt for a laboratory setting) is to simply use a beamsplitter 
to re-image the intra- and extra-focal images on the same detector. Although the 
size of the sub-apertures can be optimized by binning, the sensor will remain most 
sensitive to a given blur angle since / cannot (easily) be varied. What is gained here 
in ease of implementation is paid for in post-processing: the images have to be nor- 
malized, scaled and co-aligned accurately. Furthermore, any aberrations introduced 
after the beam is separated will be interpreted as a spurious bias signal. 


5. Advantages of Curvature AO 


5.1. Bimorph mirrors 


So far, we have only discussed the principles and implementations of the curvature 
wavefront sensor, but the real advantage of this method comes from the coupling of 
this sensor with a bimorph mirror. Such a mirror is composed of an electrode pattern 
sandwiched between two wafers of oppositely polarized piezoelectric ceramic, and 
a grounding electrode around each (Fig. 4‘a)). When a voltage is applied to one 
electrode, the piezo-ceramic on one side will expand while the one on the other side 
will contract, bending the surface at the location of the electrode (Fig. 4(b)). 


®For a parallel beam, the Fresnel distance before which diffraction effects appear is equal to 
L<d/. 
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(a) (b) : 


Reflecting surface 


Din inninninsimsnninsimioinsinosinsimiontnsimsninsiminnsimn Piero Null curvature 


LE : [Vv] 


Vv Constant curvature 


Fig. 4. (a) An electrode pattern is sandwiched between two oppositely polarized piezoelectric 
wafers (hatched) with a grounding electrode on the outside. (b) When an electric field is applied 
across the piezo ceramic, it expands or contracts, generating constant curvature on the electrode 
pattern. 


In fact, the surface will bend with a radius of curvature?: 


Vidz 
Roy = 2 = (9) 


where V, is the voltage across the electrode of thickness ¢ and dzy is the transverse 
element of the piezoelectric tensor d;;. Thus, if the electrode pattern matches the 
wavefront sensor lenslet pattern, the sensor measures exactly what the mirror is 
capable of producing (see Fig. 5). The theoretical interaction matrix for such a 
system is purely diagonal (Fig. 7(a)) and is trivial to invert. As such it provides the 
least amount of reconstruction noise, as there are no poorly conditioned modes. In 
practice, there is a piston term produced by the edge electrodes which is invisible to 
the sensor, but it is easy to filter out and has no effect on the final image (nonetheless 
care needs to be taken if the adaptive optics system is to be used in interferom- 
etry, e.g., the MACAO systems!?"! on VLTI at ESO!?). This also means that 
in principle, there is no reconstruction required and each electrode-lenslet channel 
can be controlled independently. Thus, in theory the adaptive optics control loop 
could be implemented analogically;® although this has never been tried in practice, 
it remains an interesting implementation in applications where bandwidth is of 
utmost importance. However, due to imperfect materials of the mirror, connecting 
wires, and diffraction in the sensor, there is usually some level of crosstalk between 
electrodes and neighboring lenslets and the interaction matrix does contain some 
non-zero non-diagonal terms (see Figs. 5 and 7(c)). A standard inversion scheme 
allows reconstruction of the mirror voltage vector but with minimal noise propaga- 
tion due to the good conditioning of the interaction matrix. 


5.2. Avalanche photodiodes 


The use of avalanche photodiodes in the wavefront sensor is also advantageous, as 
there is obviously no read noise. Also, because the sensor only measures the intensity 
within a lenslet, there is no need to form an image and read multiple pixels for each 
measurement. The wavefront sensor can thus be set to work at the highest frame 
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(a) (b) 


Fig. 5. (a) The phase produced by actuating one electrode (top) and four of them (bottom). 
(b) The associated Laplacian of the phase maps reveals the electrode pattern. The junction wires 
are visible and are in part responsible for the cross-talk in and non-diagonal elements of the 
interaction matrix. 


rate available and the photon noise can be averaged numerically in the closed loop 
integrator by choosing the appropriate loop gain. This simplifies the control of the 
loop, as there is no need to change the frame rate to optimize the SNR at each read: 
the sensor is run at its maximum available frequency but a low gain value ensures 
that the overshoot of the closed loop transfer function is kept to a minimum. 


5.3. Strehl efficiency 


Finally, the curvature sensor is ideally suited to measure atmospheric turbulence. 
In the case of Kolmogorov turbulence, the amplitude of the spatial spectrum of 
the phase varies as k~!!/° (k being the spatial frequency). Curvature is the second 
derivative of the phase, so the amplitude of the spectrum of curvatures varies as 
kt1/6 (differentiating a function multiplies the amplitude of its Fourier transform by 
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the spatial frequency k, but curvature is the second-order derivative, so multiplies 
the amplitude spectrum by k?). This means that the curvature spatial spectrum 
is almost flat, thus measurements are uncorrelated. This in turn implies that each 
electrode-lenslet channel is statistically (almost) independent from the others and 
corrects the phase deformation optimally on a per—actuator basis. This is of course 
important in astronomical applications where each photon matters. References 13 
and 14 studied the efficiency of adaptive optics systems; it is defined as the ratio of 
the effective number of ideal modes corrected to the actual number of actuators in 
the system, g = Neg/Nact. Fewer larger sub-apertures at equivalent correction (Neg 
constant) improves the limiting magnitude (or reduces laser power requirements), a 
desirable feature in astronomical applications where photons are precious. But AO 
systems with fewer sub-apertures and pixels are not only less expensive to build 
and to maintain, they also benefit from reduced calibration and reconstruction 
errors; the good match between what the sensor measures and what the mirror can 
produce means that interaction matrices are diagonal, leading to reduced invisible 
(or low sensitivity) modes and improved noise propagation. Maybe for all these 
reasons, curvature AO systems are found to have substantially higher efficiencies 
than Shack-Hartmann ones (see Fig. 6). It is unfortunate that these studies were 
carried out before pyramid sensors started delivering results, as it would be most 
interesting to see where the pyramid sensor lies on such a family of curves. 


log (Strehl efficiency) 


10 100 Ns 1000 


Fig. 6. The Strehl efficiency for curvature AO systems (open circles) and Shack—Hartmann ones 
(filled circles), as a function of number of sub-apertures from Ref. 14. The dotted lines represent 
isovariances labelled with their effective number of sub-apertures Neg = 10, 20,40. ©The Astro- 
nomical Society of the Pacific. Reproduced by permission of IOP Publishing. All rights reserved. 
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6. Disadvantages of Curvature Sensing 


6.1. Null sensor 


Due to its limited linear range, curvature sensing is effective as a null sensor. How- 
ever, as such, it is difficult to use in applications working off-null, for example 
to include so-called “centroid offsets” to compensate for static aberrations, or in 
GLAO.’ MOACS or open loop AO.“ To increase the dynamic range, the optical 
gain can be reduced by increasing the defocus length, but there is nonetheless an 
intrinsic blur and an increase in aliasing in working far off null. This is best under- 
stood if we think of the effect of a tilt on the wavefront: if left uncorrected, higher 
spatial frequency defects will not line up on the contrast image (as they will be 
offset from one another due to the original tilt), and will produce a signal with even 
more apparent high spatial frequencies. In the example shown in Fig. 1, the contrast 
image would show an elliptical zone rather than a circular one if the two defocused 
images were not aligned. 


6.2. Tip-tilt sensing noise 


The edge sub-apertures provide the radial derivative around the pupil. If these sub- 
apertures were to line up precisely with the edge of the pupil, and there was no 
tip-tilt, no flux would reach them, which in practice can cause some difficulties. 
For practical reasons, the edge sub-apertures intersect some of the pupil so that 
each sub-aperture covers the same illuminated area and correspondingly has the 
same signal-to-noise ratio when there is no input perturbation. This reduces the 
sensitivity of the curvature sensor to modes with non-zero phase divergence, of 
which tip and tilt are the first and carry the largest variance. Unevenly illuminated 
sub-apertures may actually be preferable, but an appropriate weighting scheme has 
to be used in the reconstructor. 


6.3. Aliasing 


The curvature sensor’s best known disadvantage is its sensitivity to aliasing. Aliasing 
is the spurious interpretation of high spatial frequencies (above the cut-off spatial 
frequency k. = 1/2lact) folded back onto the low-order modes that are being mea- 
sured; these high spatial frequency are propagated through the reconstructor and 
appear as low order modes in the correction. The amplitude of the spectrum of 
phase curvatures varies as k*+!/°, which means that there is a slight increase in cur- 
vature signal at smaller and smaller spatial scales (until the inner scale). Although 
on average this signal is of zero mean, its variance increases with decreasing spatial 
scale. These high frequency curvature signals are averaged inside the lenslets making 


>See Chapter 16. 
°See Chapter 17. 
dFor example, see Chapter 13. 
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(a) (b) (c) 


Fig. 7. Interaction matrices for a 36-electrode system arranged in three rings of (6, 12,18), with 
WES vectors shown as columns and mirror electrodes as rows. All WFS responses are shown at 
the same scale for a normalized input electrode voltage vector. (a) At large defocus distances, 
the sensor is well-behaved and the matrix is diagonal, but the response for a normalized signal is 
weak. (b) Optimal tuning of the wavefront sensor, with a strong signal but limited cross-talk (non- 
diagonal terms). (c) The optical gain is too strong, and diffraction is causing cross-talk between 
sub-apertures. 


the measurements, but still produce an instantaneous (spurious) signal across those 
(low-order) sub-apertures. The aliasing error can be decreased by decreasing the 
defocus length, but when the blur angle associated with these high spatial frequen- 
cies starts to introduce diffraction, these can lead to a nonlinear response of the 
sensor, which compounds the aliasing problem (Fig. 7(c)). There is no simple ana- 
lytical way to compute the aliasing error, but Monte Carlo simulations carried out 
by Ref. 15 show that it is approximately double that of Shack—Hartmann systems. 

Increasing the number of sub-apertures on a curvature AO system improves 
the aliasing error and forces the sensor to work at greater defocus lengths. In this 
case, it is the sensitivity to low order modes which is decreased because they suffer 
from low optical gain. Unless clever techniques such as multiple defocus lengths (see 
Section 8.4) are used, this sets an upper limit to the order of curvature AO systems. 


7. Astronomical Curvature AO Systems 


There are a handful of highly productive curvature AO systems around the world, 
presented non-exhaustively in Table 1, Modern developments in AO have focused 
on increasing the field of view (GLAO, MOAO) and also on increasing the on-axis 
Strehl of SCAO systems for high dynamic range imaging. A requirement for wide 
field AO is a wavefront sensor with large dynamic range, at odds with the curvature 
null-sensing regime. The low sensitivity to low spatial frequencies for very high-order 
systems is a drawback of curvature sensing for high dynamic range applications, 
but can be compensated for by a “woofer-tweeter” architecture. Although pyramid 
sensors appear extremely well suited for the high order sensing, long defocus length 
curvature remains an attractive possibility. 
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Table 1. Curvature AO systems. 


Degrees 

System Telescope of freedom Year of operation 
UHAO13®: 16 CFHT, UKIRT, UH88 13 1994, 1995, 1996 + 1998 
PUEO!” CFHT 19 1996 ++ 2012 
UHAO36/Hokupa’a!® 19 CFHT, Gemini 36 1997, 1999 +» 2001 
AO3620: 21 Subaru 36 2002 ++ 2008 
MACAO VLTI!® #4 ESO VLT 60 2003 + present 
Sinfoni?? /CRIRES?% ESO VLT 60 2003 ++ present 
Hokupa’a85?+ Gemini N, AEOS 85 2001 ++ 2004, 2008 ++ present 
NICI?®: 26 Gemini S 85 2007 + 2013 
AO18827; 28 Subaru 188 2008 +> present 
T3PWEFS?9 WHT 153 2017 +» present 


8. Recent Advances in Curvature AO 


8.1. Use of CCDs 


Avalanche photodiodes are ideal detectors due to the lack of read noise, low dark 
currents and high quantum efficiences. Nonetheless they are expensive, and actively 
quenched APDs have been known to fail. In an effort to lower the cost of curvature 
AO systems, MIT-Lincoln Labs in collaboration with ESO developed a CCD detec- 
tor specifically for curvature sensing, the CCID-35. These detectors are composed 
of light-sensitive “super-pixels” , surrounded by a storage area on either side. As the 
membrane mirror scans the defocused pattern on each side of focus, the charges are 
sent back and forth from the super-pixel to each storage area, so that the counts for 
the intra- and extra focal images are co-added, but kept separate. The detector is 
read at a chosen frame rate which is a multiple of the membrane mirror frequency, 
so that the read noise is only incurred once. The optical fibers from the lenslet 
array can be terminated in a V-groove arrangement that matches the detectors 
super-pixel pattern, making it easy to match the round geometry of the pupil and 
sub-apertures to the the square geometry of CCD pixels. 

Such a camera was tested with success on the PUEO adaptive optics system 
at CFHT.%° The occurence of read noise implies that the frame rate has to be 
adapted to optimize the SNR at each read, making operations more cumbersome, 
but the performance was otherwise comparable to the APDs, making such a system 
perfectly useable for high-order curvature adaptive optics. 

EMCCDs can also potentially be used in photon counting mode to replace 
APDs. Although the excess noise (= V2) is equivalent to a drop in throughput 
of 50%, its implementation is simpler than a typical APD-based curvature wave- 
front sensor, where the APDs are fed by fibers behind a lenslet array. The cou- 
pling between all these elements lowers the throughput to a point where these two 
approaches may be equivalent in noise. 
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8.2. Possibility of single out-of-focus image 


The idea of using a single out-of-focus image is not new,®! but there is a degeneracy 
on the sign of the phase when the complex amplitude is sought from an intensity 
measurement, because the intensity is the modulus squared of the Fourier transform 
of the input complex amplitude. This is the reason why two images (one inside of 
focus, the other outside of it) are needed; namely, to identify the sign of aberrations 
that can be described by even functions, which produce centro-symmetric intensity 
distributions (e.g., focus, spherical aberration, etc.). 

However, there are simple tricks that break down this degeneracy and can be 
used to lift this ambiguity. For example, if the pupil function is assumed to be 
uniform (neglecting scintillation) and made asymmetric, then it is possible to infer 
the sign of the phase with a single defocused image. In essence, this is equivalent 
to setting z2 = 0 (and I,, = Ip) in Eq. (4), with an added constraint to make the 
transmission function P(x, y) an odd function. 


8.3. Spatial filtering 


In the same way that a spatial filter at the entrance of a Shack—Hartmann wavefront 
sensor can reduce the aliasing,®? a similar device could be used to reduce the aliasing 
of a curvature sensor. A spatial filter/field stop could easily be implemented in 
the intermediate focal plane of the membrane mirror. Removing the high spatial 
frequencies by an optical filter essentially removes the light with too high a blur 
angle from the sensor and introduces an uneven illumination over the sensor. The 
size of the spatial filter should be A/lact where lact is approximately the size of 
the electrodes. For systems optimized for the near-infrared with large electrodes, 
where the potential for improvement due to reduced aliasing is the largest, the 
throughput through a spatial filter at the shorter wavefront-sensing wavelength 
would be very small. Conversely, high-order systems, where the light losses through 
a spatial filter would be less, are less affected by aliasing and are dominated by low 
spatial frequency sensitivity. However, if the wavefront sensing were to be carried 
at longer wavelengths, or for a high-order system producing reasonable correction 
at visible wavelengths, this improvement should prove beneficial to curvature, as it 
has been demonstrated to be in Shack—Hartmann sensors for high dynamic range 
AO systems. But in practice, the gain of anti-aliasing optical filtering in curvature 
sensing remains an open question. 


8.4. Multiple defocus length 


To address the issue raised in Section 6.3 regarding the loss of sensitivity to low 
spatial frequencies of a high-order curvature AO system, an ingenious solution was 
found by O. Guyon®*:34 and implemented in Subaru Telescope’s AO188 system. 
Instead of driving the membrane mirror to a sine (or square) wave of a given 
amplitude, the membrane mirror is modulated to produce a two-step function (or 
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approximation thereof). In doing so, two different defocus lengths are produced. 
Synchronous detection allows integration of counts for large intra-focus, small intra- 
focus, small extra-focus and large extra-focus intensities. For the short defocus dis- 
tances, measurements are effectively convolved with a broad filter to decrease the 
effective number of subapertures and reduce the sensitivity to diffraction. 

Although it is not trivial to drive a membrane mirror with a complex output 
off its resonant frequency, Subaru’s AO188 is currently the curvature system with 
the highest number of degrees of freedom used routinely in astronomy, and its 
performance demonstrates that this method is effective. 


8.5. Nonlinear curvature sensing 


Reference 35 extended the idea of using four defocus planes (Fig. 8), combining it 
with a phase diversity scheme, using the Gershberg—Saxton algorithm, selected for 
its simplicity and flexibility. The nonlinear curvature wavefront sensor (nICWEFS) is 
computationally intensive for real time applications, as the GS algorithm is iterative 
and the phase also needs to be unwrapped, but it can be linearized when phase 
excursions are small, allowing one to bypass the nonlinear reconstruction. Curiously, 
the nlICWFS sensitivity is improved when a known aberration is added to the beam, 
as the diffraction pattern around the pupil from a very clean wavefront otherwise 
dominates the signal and reduces its sensitivity. Although showing great promise 
from a theoretical standpoint, to our knowledge the nlICWFS has not yet been used 
in operational astronomical AO systems. 
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Fig. 8. (a) A transverse image of the illumination as a function of defocus distance, from simula- 
tions by Olivier Guyon (used with author’s permission). (b) Top, Simulated frames obtained by the 
nlICWFS in monochromatic light; middle, in polychromatic light with no chromatic compensation; 
and bottom, in polychromatic light with chromatic compensation. The wavefront error across the 
8m diameter pupil used for this simulation is shown in (c), from Ref. 35. © The Astronomical 
Society of the Pacific. Reproduced by permission of IOP Publishing. All rights reserved. 
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Curvature adaptive optics is currently not as widespread as other techniques, 
most likely because it is difficult to extend beyond single conjugate AO, and due to 
the limited linear range of curvature sensing.?° 3” However, an extension of curva- 
ture sensing, the “Two Pupil Plane Position Wavefront Sensor” (T3PWFS),° has 
recently been deployed at the William Herschel Telescope.?? 
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Wavefront sensors typically measure a derivative of the wavefront. In this chapter, 
we address two important problems: how to convert these wavefront derivatives 
into wavefront estimates, and how to use these wavefront estimates to drive the 
deformable mirrors. 


1. Introduction 


Wavefront sensors (WFSs) in astronomical adaptive optics (AO) typically measure 
a derivative of the wavefront. The Shack—Hartmann (SH) WFS®* and the modu- 
lated pyramid WFS both measure the x- and y-slopes, while the curvature WFS? 
measures the Laplacian of the wavefront. In this chapter, we address two important 
problems: wavefront reconstruction and wavefront control. Wavefront reconstruction 
deals with the problem of converting wavefront derivatives into wavefront estimates, 
while wavefront control deals with how to use these wavefront estimates to drive 
the deformable mirrors (DMs). 

Without loss of generality, in this chapter we often assume that the WFS is of 
a SH type, with square subapertures and with a linear response to the incoming 
wavefront. The DM has a square array of actuators with the same geometry as the 
WES. The actuators are optically conjugate to the corners of the lenslet array in 
what is commonly known as the Fried configuration. The DMs also respond linearly 
to the applied command. 

Most AO systems use closed-loop control, where the wavefront sensor is located 
after the DM. Thus, the WFS measures the corrected wavefront. In open-loop con- 
trol, the wavefront sensor is located before the DM, and does not see the correction 


*See Chapter 9 of this Volume. 
>See Chapter 10 of this Volume. 
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applied by the DM. In so-called pseudo open-loop control, we convert the closed-loop 
measurements into open-loop measurements by adding the contribution from the 
applied DM commands. The wavefront reconstruction strategy depends on whether 
we use closed-loop or open-loop measurements. 


2. Interaction Matrix 


The interaction matrix, G, describes how the vector representing the DM command, 
a, affects the WFS measurement vector, s 


s= G,a. (1) 


The interaction matrix incorporates the details about the behavior of the WFS and 
of the DM and is needed to compute the reconstructor. The simplest way to measure 
it is to poke the actuators one by one by increment d. The poke increment is a free 
parameter but should be chosen to be as large as possible to maximize the signal- 
to-noise ratio (SNR) while keeping both the WFS and the DM actuators within 
their linear range. We create a matrix of poked actuators, A = [aj,a9,a3,...], 
where a? = [d,0,0,...], af = [0,d,0,0,...], etc. In this case, A = dI, where I 
is the identity matrix. The corresponding matrix of measurements, S is similarly 


comprised of S = [s1,S2,83,...]. Using the relationship 
S=G,A, (2) 
we see that 
Ga=SA™* 
= S(al\—* 
=d ‘5. (3) 


This simple method works well when there is an accurate way to measure the change 
in centroids to the applied commands (e.g., using a calibration source). However, 
the method is noisy since sub-apertures far away from the applied actuator are only 
measuring noise. The measurements from those sub-apertures should be masked. 

The actuators do not need to be poked one by one: instead, we can use any 
orthogonal set of pokes. By careful selection of the set of commands, we can greatly 
improve the SNR. An ideal choice is the Walsh basis set.1»? 

There are some AO systems, notably those that employ adaptive secondary mir- 
rors, where calibration of the interaction matrix in controlled conditions is difficult 
or impossible. In this case, we are forced to calibrate the interaction matrix using 
starlight. Poking actuators one by one is not a good solution since the measurements 
are affected by atmospheric turbulence. Instead, we create mirror modes (some pos- 
sible basis sets are described in Section 3.3). Each mirror mode is associated with 
a time-varying sinusoid at a high temporal frequency (typically 50 Hz or higher), 
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at which we do not expect to find much power in the atmospheric turbulence. 
The effect of each mode is demodulated from the recorded WFS measurements.*:4 
The reader is referred to Esposito et al. for more details.? 

All of these methods for generating interaction matrices work when there are 
multiple guide stars and/or DMs. 


3. Classical Reconstructors 


The term classical reconstructors is used here to describe reconstructors that are 
a regularized inverse of the interaction matrix. In most cases, the wavefront recon- 
struction is implemented by multiplying a reconstructor matrix, R, by a vector of 
measurements, s, to obtain the residual wavefront, a, in actuator space 


a= Rs. (4) 


In general, G, is not a square matrix so it cannot be inverted. We could define the 
reconstructor as the pseudo-inverse of G', 


H=(CLG,) °G.- (5) 


However, G'G, is not invertible. Instead, the reconstructor is a regularized pseudo- 
inverse of G,. How to regularize the matrix inversion is the subject of this section. 
There is a fundamental trade-off between reconstructing the wavefront to cancel 
the centroids as much as possible on one hand, and reducing the effect of noise in 
the centroids on the wavefront reconstruction. Even if the guide star is extremely 
bright, the centroid measurements are contaminated by noise stemming from spatial 
aliasing? and truncation of the detector area.® 

Three different ways to compute classical reconstructors are discussed in this 
section: singular value decomposition, regularized least-squares and modal recon- 
structors. These reconstructors, which comprise almost all existing reconstructors 
in astronomical AO, drive the DM actuators to make the WFS measurements as 
close as possible to zero (or to the calibrated reference values of the WF'Ss). The 
reconstructors may incorporate knowledge of the location of the guide star(s) and 
the statistics of the turbulence and noise. 

Tomographic reconstructors on the other hand, use measurements from several 
guide stars to drive the actuators to compensate the wavefront optimally in the 
direction of the science targets (not the WFSs). Tomographic reconstructors, which 
are the subject of Section 4, can also incorporate knowledge of the location of the 
science target(s) and the strength of the turbulent layers. 

For computational reasons, AO systems with large numbers of measurements 
or actuators may need to reconstruct the wavefront using alternative approaches to 
reconstruction matrices. Some of these approaches are discussed in Section 5. 
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3.1. Singular value decomposition 


The traditional way to compute RF is to take a singular value decomposition (SVD) 
of G,. The SVD reconstructor is still used for small scale AO systems and for AO 
systems where the sampling of the WFS is denser than the interactuator spacing. 
The attraction of the SVD reconstructor is that it takes the interaction matrix and 
the number of modes (or minimum singular value) as the only inputs. 

Let us decompose the matrix Ga using the SVD 


Go=UDV', (6) 


where U and V are unitary matrices representing the singular modes of the WFS 
and the DM, respectively. D is a diagonal matrix, with the values along the leading 
diagonal known as singular values. 

Recall that a unitary matrix has the property that its inverse is equal to its 
transpose. Hence, the pseudo-inverse, G7, is 


GreVpoT. (7) 


The reconstructor is a regularized pseudo-inverse obtained by inverting only the 
largest singular values. For example, to keep the first m singular modes, we define 
D-' as 


1V/Du, ifi<m, 
rae game (8) 


0, otherwise. 


The number of modes to keep (or, alternatively, the smallest singular value to invert) 
is a free parameter. At a minimum, invisible modes such as pure actuator piston 
or global waffle must be removed. The optimal number of modes depends on the 
SNR ratio of the WFS and is generally determined via numerical simulations and/or 
on-sky performance testing. Figure 1 illustrates two modes with low singular values. 

The disadvantage of the SVD reconstructor is that each mode (which comprises 
a vector of actuator commands) is either kept intact or discarded entirely, leading 


Fig. 1. Global waffle mode on the DM (a) and a mode containing partial waffle (b). These modes 
are not well sensed by a SH WFS and must be removed in the reconstruction process. 


Wavefront Reconstruction and Control 209 


to lower performance. The SVD process also imposes a heavy computational burden 
which can be prohibitive for systems with a large number of actuators. 


3.2. Regularized least-squares reconstructors 


Most current astronomical AO systems use some form of regularized least-squares 
reconstruction.*”° A regularized least-squares reconstructor takes the form 


R=(GTWae,+Q)'aTw, (9) 


where W is a diagonal matrix that weights the measurements differently and may 
be omitted if all measurements are equally noisy or the noise is unknown. Exam- 
ples where W is useful are: SH WFSs where some subapertures are only partially 
illuminated, curvature WFS with different size subapertures, and any WFS guiding 
on a laser guide star (LGS), where the elongation of the guide star depends on the 
location of the pupil.1° The matrix G7WG, is not invertible for the same reason 
that some of the singular values in the SVD are very small: some mirror modes are 
unsensed or poorly sensed by the WFS. Hence, a regularization matrix Q is needed 
to make the system of equations well conditioned. 
The simplest form of regularization is 


Q=al, (10) 


where J is the identity matrix and a is a constant. The optimal value of a is 
found by trial and error and increases with increasing measurement noise. This 
form of regularization penalizes all of the actuator commands equally. However, the 
mirror modes are not penalized equally: it can be shown that this reconstructor is 
exactly the same as the SVD reconstructor where the diagonal matrix of Eq. (8) is 
replaced by: 

A. 1 


(11) 


This means that mirror modes with a low singular value suffer much stronger penal- 
ization than modes which are well measured and hence have a high singular value. 

We can also set Q explicitly to penalize unwanted modes. For example, piston 
can be removed using extremely strong penalization by setting 


Q= pp’ , (12) 


where p’ = [1,1,...] and § is a large constant. In the same way, we can penalize 
other unwanted modes, such as tip-tilt for AO systems with a separate tip-tilt 
mirror. Local waffle, which is prevalent in SH WFSs, can be penalized using a 
local wafHle supressing matrix Q.” Alternatively, we can use prior knowledge that 
atmospheric turbulence is smooth and use Laplacian penalization, a very sparse 
matrix Q which penalizes the curvature of the wavefront.!! 
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Finally, let us consider the minimum-variance estimator (also known as the min- 
imum mean squared error estimator).!? This reconstructor minimizes the wavefront 
error when running in open loop. There are different, mathematically equivalent, 
ways of writing this 


R= Cac, (13) 
= CuCG Ce a Gh (14) 
= (Ga Can Gat Cog) G5 Cam (15) 


where Cy, is the covariance of the measurement noise and Cy, is the covariance 
of atmospheric turbulence in actuator space (i.e., the covariance of the actuators 
commands in the absence of noise). 

The matrix 


Cas = Cia. (16) 
represents the covariance between the measurement and the turbulence, while 
Corp = GC) Cia, £ Cnn (17) 


is the covariance of the measurements. 

We can calculate the matrix C;,,,, analytically or using Monte Carlo simulations, 
based on the brightness of the star and the noise characteristics of the detector. 
Alternatively, Cy, can be measured on the sky. The calculation Cgq is described in 
Section 4.3. 

In practice, most AO systems run in closed-loop, where the measurements cor- 
respond to the residual wavefront errors, not the wavefront itself. The reconstructor 
becomes 


R=(G C, Gatac,, ) G,G,,. (18) 


where a is a small constant tuned to the signal-to-noise of the WFS. This recon- 
structor is used at Keck Observatory and on GeMS.*8 


3.3. Modal reconstructors 


In many cases, we wish to convert from wavefront derivative measurements to a 
set of global modes before converting to actuator commands. The most commonly 
used modes in astronomical adaptive optics are the Zernike,!’ Karhunen—Loéve4 
(KL), disk harmonics!® and Fourier!® !” basis sets. The attractive properties of each 
modal basis function are described in what follows. 

Zernike modes are popular as they correspond to physical optical quantities, 
such as piston, tip-tilt, focus and astigmatism. There is a simple analytical expres- 
sion for the modes and their for derivatives over a circular pupil. Their use should 
be restricted to about 45 modes, after which the modes exhibit large derivatives 
near the edge of the pupil. 
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KL modes are the eigenmodes of atmospheric turbulence (i.e., the eigenmodes 
of Caa) and by definition can be used to measure or compensate the largest fraction 
of turbulence using a given number of modes. Disadvantages of KL modes are the 
difficulty in computing them and the fact that they have no physical meaning. 

Disk harmonics are similar to Zernike modes, but are much better behaved near 
the edges. Adaptive secondary mirrors, which have a large number of modes over a 
circular pupil, are usually commanded in disk harmonics or KL modes. 

Fourier modes are orthogonal over a rectangular aperture, and are ideally suited 
to WFSs with measurements in a grid and DMs with a rectangular array of actua- 
tors. The modes are perfect for describing wavefront propagation using the frozen 
flow hypothesis, since the modes are simply translated across the pupil. In addition, 
they naturally emerge from the FFT reconstructor, described in Section 5. Fourier 
modes are the KL modes over a rectangular aperture. The main disadvantage of 
Fourier modes is that physically meaningful modes such as tip-tilt, which are often 
compensated separately, are not easily accessible in Fourier space. 

In addition to these modes, the singular modes described in Section 3.1 can be 
used as a modal basis set. 

Figure 2 shows the first 10 modes for the Zernike, KL and disk harmonic basis 
sets. It can be seen that the low-order modes are all similar, and the significant 
differences occur for higher modes. 

Once we have our modal basis set, we need to calculate the reconstructor. Let 
Z = [zi ,23,...] represent the modes evaluated at the discrete actuator points, such 
that a = Zm, where m is the vector of modal coefficients. The number of columns 
of Z is equal to the number of modes. 

We define a modal interaction matrix, Gm, = GaZ, such that 


s=Gnm, (19) 


and obtain the least-squares solution for the modal coefficients as 


m= (GEGn) ‘Gis. (20) 


Fig. 2. First 10 modes excluding piston for Zernike (a), Karhunen-Loéve (b) and disk harmonic 
(c) basis sets. 
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Because poorly sensed modes, such as piston and global waffle, are not recon- 
structed, Gl Gm is well conditioned. 

Generally, the modal coefficients are filtered so that some modes have a higher 
loop gain than others.!® 7° Let us consider the filtering to be a matrix multiplication 
by matrix F’, which would usually be a diagonal matrix with diagonal values between 
0 and 1. Then, using the relationship a = Zm, we obtain 

a= ZF(G".Gm) Gis (21) 


m 


and the reconstructor is 
R= ZFC) Gy) Gh. (22) 


SH WFSs that employ quad cells have an optical gain (called the centroid gain) 
that varies depending on the strength of the turbulence. The same effect occurs 
for curvature and pyramid sensors, except that the optical gain is not constant but 
depends on the spatial frequency of the aberrations. As a result, a modal recon- 
structor with varying modal gains is ideally suited to these sensors. 


3.4. Projection of unwanted modes 


Often, the reconstructor produces unwanted modes. For example, we do not want 
piston on the DM, and sometimes tip-tilt is corrected with a separate mirror. In 
this case, we project the unwanted modes from the DM commands. Let us denote 
the unwanted modes as v; and define the matrix V as V = [vj,vo,...]. Then the 
actuator projection matrix, Pa, is 


P,=I-VV?, (23) 


and the new reconstructor, R’, becomes R’ = P,R. 

Alternatively (or better still, additionally), certain modes can also be projected 
out in measurement space. For example, consider tip-tilt measurements from an 
LGS. These measurements are corrupted by turbulence on the uplink of the laser, 
and must be removed. The slope projection matrix, P,, is defined in a manner 
similar to Eq. (23) and the new reconstructor becomes R! = RP3. 


4. Minimum-Variance Tomographic Reconstructors 


Tomographic reconstructors are used to convert measurements from several guide 
stars in to commands for one or more DMs. In this section, we focus on minimum- 
variance tomographic reconstructors, which optimally command the DMs to correct 
the wavefront in the direction of the science targets. This differs from classical 
reconstructors, which (optimally or otherwise) command the DMs to correct the 
wavefront in the direction of the WFSs. Minimum-variance reconstructors operate 
on open-loop measurements and hence require pseudo open-loop control. 
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Most AO systems with multiple guide stars (GeMS,* Argos and LINC- 
NIRVANA) use classical reconstructors like those presented in Section 3, One way 
to avoid the tomography problem in multi-conjugate adaptive optics (MCAO) is 
to conjugate the WFSs to each DM, a technique called layer-oriented MCAO.?!:2? 
Since layer-oriented MCAO also uses classical closed-loop reconstructors, it does 
not warrant further discussion here. Minimum-variance tomographic reconstructors 
have been sky-tested on two AO demonstrators: CANARY’? and RAVEN.?4 

Atmospheric turbulence is distributed vertically in the atmosphere between the 
ground and an altitude of about 20km, which means that the turbulence encoun- 
tered by the science targets and the guide stars depends on their location in the 
sky. In addition, LGSs emit light from much lower altitudes (typically 10-15 km for 
Rayleigh guide stars and 90km for sodium guide stars) than the science targets, 
giving rise to the well known cone effect (focal anisoplanatism). The problem can 
be stated as follows: given noisy wavefront sensor measurements, what commands 
for the DM(s) produce the best correction over the science field? There are two 
conceptual paradigms to tomographic reconstruction. We either reconstruct virtual 
turbulence layers at different altitudes (Section 4.1) or we estimate the turbulence 
in different directions (Section 4.2). Both methods are equivalent, although their 
computation burdens may differ. 

For extremely large telescopes, both the calculation of the reconstructor in soft 
real-time and the real-time multiplication of slopes by the reconstructor can be com- 
putationally challenging. For this reason, a number of alternative approaches that 
are computationally lighter have been developed and some of these are presented 
in Section 5. 


4.1. Reconstruction with virtual turbulence layers 
The method of virtual layers has two steps: an estimation step and a fitting step. 


e Estimate the wavefront at each layer based on the WFS measurements. 
e Project the wavefront onto the DM(s) that optimally corrects the science field. 


The details of the reconstruction are comprehensively described in the work of 
Ellerbroek and his collaborators,?°:?° and their work is outlined (and simplified) 
here. 

Let us assume that the atmospheric turbulence consists of a finite number of 
very thin layers at different altitudes, h;, with no turbulence between the layers. The 
number of layers needed to model the turbulence depends on the diameter of the 
telescope and the angular extent of the science field and guide stars. The wavefront 
at each layer is sampled with the same density as the sub-aperture size or interac- 
tuator spacing. Bilinear interpolation is used to evaluate the wavefront in regions 
between the points where the turbulence is sampled. The wavefront aberration for 
light passing through point (u,v) in layer h; is ~;(u,v). Let us write as a vector 
the sampled wavefronts over all of the layers as x = [wW1 (u,v), W2(u, v),...]. This is 
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Fig. 3. Schematic of turbulence layers. The grid shows the location of the points where the 
wavefront is estimated. 


illustrated in Fig. 3. We want to calculate the estimation matrix, E,, which produces 
the best estimate of x given the measurements, s. The relationship between the x 
and the WFS measurements, s is described by the turbulence influence matrix, G', 


s=G,rt+n. (24) 
The minimum-variance estimation matrix is: 
f.=C.0* 
= (GIO3Gs + Cg) GT On- (25) 


This should look familiar as it is the same estimator as Eqs. (13)-(15). The calcu- 
lation of C,, is described in Section 4.2. 

The second step is to calculate the fitting matrix, which produces the DM 
actuator commands that best compensate the estimated turbulence: a = Fx. Let 
us define a vector of residual wavefront errors over the science field as ¢. Then 


@ = Ayx = Aya, (26) 


where H, and H, are the influence functions that describe how the atmospheric 
wavefront errors and the DM commands affect the wavefront. 
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The fitting matrix, F,, is found by minimizing Eq. (26) with respect to a 
F, = (H2Ha+ al) HT Ay. (27) 


The identity matrix is multiplied by a small constant a to regularize the matrix 
inversion. 
Finally, we obtain the reconstructor as 


R=F,Ey. (28) 


4.2. Method of spatio-angular covariance matrices 


Instead of estimating the wavefront at different layers, we can estimate the wavefront 
in different directions.?” 79 

To reconstruct using spatio-angular covariance matrices, we also apply an esti- 
mation step followed by a fitting step. 


e Estimate the wavefront over each direction of interest in the science field. 
e Project the wavefront onto the DM(s) that optimally corrects the science field. 


The formulas used are the same, but the definition of the covariance matrices differ. 

The estimation step consists of estimating ¢, the wavefront over every direction 
in the science field, directly, without explicitly estimating the vertical distribution 
of turbulence. The minimum-variance estimate for ¢ given the WF'S measurements 
sis 


¢ = CgaC;,'s. (29) 


Covariance matrices involving wavefront slopes can be calculated directly.°° How- 
ever, we prefer to use covariance matrices involving wavefronts, since the covariances 
are easier to compute and the associated matrices are smaller. 

Let us define the wavefront aberration for light passing through point (u,v) in 
direction of the WFS ¢ is w;(u, v). We write the sampled wavefronts over all of the 
different WFS directions as x = [11 (u,v), W2(u,v),...], as shown in Fig. 4. Then 


$= G,x (30) 
and the estimation matrix, Ey, can be rewritten as 
Eg = LOE Cinta Or, Og 
= Ogee (Ce Crm Crt Cog) Cp Crm: (31) 


The fitting matrix, Fy, which converts from wavefront in several different direc- 
tions onto DM actuator commands, is: 


Fy = (H7 Ha + ot) AT Hg. (32) 
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(U,V) 


Fig. 4. Schematic of turbulence layers. The grid shows the location of the points where the 
wavefront is estimated. 


Finally, the reconstructor is given by 


R= FyEp. (33) 


4.3. Calculation of wavefront covariance matrices 


In this section, we present the formulas for the covariance between two points. These 
are needed to compute the covariance matrices and hence the reconstructors. These 
formulas assume von Karman turbulence with a finite outer scale. If the outer scale 
is infinite (i.e., Kolmogorov turbulence), then the covariance between any two points 
is infinite and the covariance piston-removed wavefront needs to be calculated.*+ 
Let us first consider the covariance of turbulence between two points in the 
pupil used in Eq. (18). The covariance depends on the distance between the two 
points, 5p, the outer scale, Lo, and Fried’s parameter, ro, at wavelength \:°? 


(¢(0), 67 (5,)) = e(2m/A)? (ro fo) °/3 (2m fodp) >! °K 5/6(2fodp), (34) 


where fo = 1/Lo and Ks5/¢ is the fractional Bessel function of the second kind of 
order 5/6. 
The value of the constant, c, is 


5/6 
= (Sr (0/5) Ee) (35) 


25/6 778/3’ 
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where the symbol T represents the gamma function. While this formula appears 
very complicated, it can be readily programmed and is very useful for calculating 
covariance matrices. 

Next, we consider the covariance of each atmospheric layer in Eq. (25). The tur- 
bulence in each layer is completely independent, so the covariance between any two 
points in layers at different altitudes is zero. The covariance between points within 
the same layer is given by Eq. (34), but ro is replaced with the value corresponding 
to that layer only, which is 


—3/5 
(36) 


where ¥ is the zenith angle and C%,(h) is the turbulence strength at height h. 

Finally, we calculate the covariance of the wavefront at two different locations, 
ry, and rz, and in two arbitrary directions, 6; and 62. We assume that there are 
N, layers, each with fraction «(k) of turbulence at layer k at height h(k). The 
covariance is 


(2071, 81). (ra) = olrafo)-®!9 Se) An fad(b))/° Ks On fob) 
_ (37) 
The distance between two wavefronts at altitude layer k, is 
bo(k) = |or(k) — p(k) (38) 
where 
pilk) = (1 — A(k)/z;)ri + h(K)O;.- (39) 


The altitude, z;, is infinite for a natural guide star and a science target, and about 
90 km for a sodium guide star. 


4.4. Modal tomographic reconstructors 


The tomographic reconstructors presented thus far are zonal reconstructors. For 
completeness, we also consider modal tomographic reconstructors. Some of the earli- 
est papers on tomography employed Zernike polynomials as the basis functions.°? 3° 
The equations to be solved are the same minimum-variance equations, but instead 
of finding the wavefront at some point in the pupil, we solve for the Zernike coef- 
ficients. Zernike polynomials are best suited to circular pupils with a modest order 
of correction (say 45 modes). 

There is one case where modal tomography is essential: tip-tilt tomography. 
Tip-tilt cannot be accurately measured from LGSs. Instead, tip-tilt measurements 
are made using one or more NGSs. Let us denote the tip-tilt measurements from 
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the tip-tilt sensors as x. Then the tip-tilt estimate in the desired direction, a, is 
given by 


6=C Cs. (40) 


The cross-covariances for Zernike polynomials can be evaluated using the very long 
Eq. (32) in Whiteley et al.3° Alternatively, it can be calculated using the filter 
function methodology developed by Sasiela.?” 


5. Alternative Implementations of Wavefront Reconstruction 


The only approach to wavefront reconstruction described thus far consists of a 
reconstructor that is multiplied by a vector of measurements. There are two com- 
putational difficulties with this approach for AO systems with a large number of 
dimensions (e.g., an MCAO system on an extremely large telescope). First, the 
multiplication of a large matrix with a long vector is computationally demand- 
ing (O(n?), where n is the number of actuators). Second, the computational bur- 
den associated with the inversion of matrices with tens of thousands of rows and 
columns to compute the reconstructor can be even more prohibitive (O(n?)). While 
the reconstructor matrices do not need to be computed in real time, we require 
fresh matrices when the atmospheric structure, elevation or telescope pupil change 
appreciably. In order to avoid both of these issues, a number of different approaches 
have been proposed. 

The fast Fourier transform (FFT) reconstructor!® replaces the matrix inver- 
sion by two FFTs and a mode-by-mode inversion of the Fourier modes. Since FFT 
operations are O(n log(n)), the computationally load is significantly lower for large 
values of n. The FFT reconstructor is successfully implemented on the Gemini 
Planet Imager.” An implementation of the tomographic reconstructor using FFTs 
has also been proposed.*® 

An algorithm called the Cumulative Reconstructor (CuRe) reconstructs the 
wavefront in O(n). The principle of CuRe is based on the model of the SH WFS 
data as averaged gradients over the sub-apertures. With the integration of the slopes 
in one dimension, chains of 1D wavefronts estimates are obtained. The independent 
chains are connected via the general trend (average slopes) in the orthogonal direc- 
tion. Finally, the chains in both directions are joined to the reconstruction via 
interpolation. For large numbers of subapertures, domain decomposition is required 
to reduce the noise propagation of the reconstructor (CuReD).3? The CuRe algo- 
rithm can also be used in conjunction with pyramid sensors by first applying a 
preprocessing step.7? 

A number of iterative techniques have been proposed to reconstruct the wave- 
front. Consider, for example, the system of equations corresponding to the single- 
guide star reconstructor of Eq. (15). 


g=(G Ci Gat, GC. 4. (41) 
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We can write this as*! 


(G7 C,iGatCi, a= GEC 1s (42) 


nr mn~) 


where G7C,,!G,+C,,) is the left-hand side matrix and G?C;,'s is the right-hand 
side matrix. The system of equations can be solved iteratively without inverting any 
matrices. G, is a sparse matrix, Cy, is a diagonal matrix and Ce can be replaced 
by a sparse approximation!! or a fractal iterative operator.*! 

Another way to solve the wavefront reconstruction problem iteratively, equiva- 
lent to Eq. (41) is to solve first 


(GCG. + Cnn )y = $, (43) 
for y and then perform the multiplications 
a= Coat y. (44) 


Using the conjugate gradient algorithm to solve the linear system of equations, 
a very good approximation to the solution can be attained with a small number of 
iterations and O(n) operations. A number of papers have been devoted to finding 
a suitable preconditioner to reduce the number of iterations.4? 4% 

Alternatively, the projection of reconstructed wavefronts onto atmospheric lay- 
ers and fitting these layers onto DMs can be performed using the Kaczmarz algo- 
rithm.“4 This uses a similar number of iterations to the conjugate gradient, but 
each iteration is computationally cheaper. In addition, the matrices do not need to 
be sparse. The Kaczmarz algorithm appears to be more sensitive to measurement 
noise. 


6. Wavefront Control 


In this section, we describe the three types of control loops for astronomical AO sys- 
tems: closed-loop, open-loop, and pseudo open-loop. We then show how to optimize 
the wavefront control to produce the lowest wavefront error. 


6.1. Control loop 


The vast majority of astronomical AO systems work in closed-loop. This means that 
the WFSs measure the corrected wavefront, that is, the residual wavefront error 
after the DM(s) have partially corrected the turbulent wavefront. The simplest 
and most commonly used closed-loop controller is the integral controller. In the 
integral controller, the actuator commands at time n, y[n], are updated based on 
the previous commands, y[n — 1] and the current input, u[n] 


y[n] = y[n — 1] + kun]. (45) 


The loop gain k is set to to minimize the residual wavefront error (typically around 
0.5). In an AO system, the input is the reconstructed wavefront, u = Rs, and 
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the output is the actuator commands, y = a. More sophisticated controllers are 
discussed in Section 6.2. 

In open-loop control, the uncorrected wavefront is measured by the WFSs, and 
the command depends only on the reconstructed wavefront: 


y[n] = uln]. (46) 


Closed-loop control has a very important advantage over open-loop control: the 
details of the AO system, such as the response of the DM and the response of the 
WFE'Ss, do not need to be known perfectly. An aberration that is only partially 
corrected the first time will continue to be reduced by successive iterations of the 
control loop. This means that DM hysteresis or a nonlinear response of the DM 
actuators to an applied voltage are not an issue. In addition, the WFSs only have 
to be linear and sensitive over a small operating range, which leads to WFSs with 
small numbers of pixels, such as quad-cell SH WFSs and pyramid WFSs. 

The disadvantage of closed-loop control are that the system can be driven unsta- 
ble (which could potentially destroy the DMs along with the science exposures) and 
that a higher frame rate is needed in order to correct the same temporal frequency. 
A rule of thumb is that the closed-loop control bandwidth (—3dB bandwidth) is 
about 5% of the frame rate, while an open-loop system has a control bandwidth 
of about 30% of the frame rate, as seen in Fig. 5. In order to take advantage 
of minimum-variance tomographic reconstructors, the turbulence needs to be esti- 
mated in open-loop, using open-loop turbulence statistics.4° However, the system 
must run in closed-loop. A solution to this problem is to use so-called pseudo-open- 
loop control.*° Here, the closed-loop measurements, s, are converted into open-loop 
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measurements, s’, by adding the wavefront induced by the DM commands, a. 
s'=s+Gya. (47) 


The reconstructor is applied to s’, and then the contribution from the DM com- 
mands is subtracted to obtain the input wu 


u= Rs'—a 
= R(s+Gaa)—a 
= Rs + (RG, — I)a. (48) 
Two matrix multiplications are needed to compute the input to the controller. 


Alternatively, we can convert Eq. (48) to a single matrix multiplication (using a 
larger matrix): 


u=[R RG,- 1] | (49) 


The control law for pseudo open-loop control is the same as for closed-loop, e.g., an 
integral controller described in Eq. (45). 


6.2. Controller optimization 


Each DM usually has its own controller. A typical AO system will have a tip-tilt 
loop and a DM loop. 
We define a linear controller as one where the new output, y[n], depends linearly 


on the previous outputs, y[n — 1], y[n — 2],... and the history of inputs, u[n], u[n — 
1],u[n — 1],.... In the z-domain, this is written as 
| -l4 ae ee 
yi oes ee (50) 


L+byz7! 4+ boz72 fees 


Here, the values of a; and 6; are the coefficients of the control law. The integral 
controller in Eq. (45) has ag = k and b; = —1. In a DM loop, the value of 6 is set 
to something like —0.99, in order to “leak” mirror modes that are not well measured 
by the WFS, such as waffle. 

The controller coefficients should be set to minimize the sum (in quadrature) 
of the bandwidth error (also known as the servo-lag error) and the measurement 
noise error. This is explained in a seminal paper by Dessenne et al.*° with a prac- 
tical implementation at a telescope described by van Dam et al.® For high-order 
controllers (i.e., linear controllers with many coefficients), the coefficients can be 
tailored to the temporal dynamics of the system (such as the compute delay and 
the temporal response of the DMs) and the disturbance (for example, wind shake 
and vibrations).*” 
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Kalman filters are used to derive optimal control laws in a range of control prob- 
lems and are also well suited to the AO control problem.*® They have been success- 
fully implemented on telescopes,*® particularly to reduce the effects of vibrations.!” 
Deriving the Kalman filter involves solving a linear system of equations called the 
algebraic Riccati equation, the dimensions of which grow very fast with increasing 
number of equations. As a result, this approach is challenging for the next genera- 
tion AO systems. The interested reader is referred to a large and growing number 
of papers on Kalman filters for AO. 
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All data from astronomical observations must be properly calibrated to ensure 
reliable scientific interpretations. We describe basic procedures and recent devel- 
opments of how astronomical data obtained in the visible and near-infrared (NIR) 
waveband are calibrated, focusing on two key procedures: flat-fielding and wave- 
length calibration. Although both procedures have been applied for a long time 
due to their fundamental importance, notable progress has been made recently, 
especially for wavelength calibration. Such developments are often dependent on 
cutting-edge technology and/or devices developed in other fields. We pay special 
attention to practical considerations in calibration procedures and emphasize the 
importance of rigorous planning and execution of calibration steps during obser- 
vations. It is of great importance to obtain calibration data under conditions as 
close to those of real source observations. 


1. Introduction 


As instruments become more sophisticated and complex in order to detect faint 
signals and conduct precise measurements, the calibration of collected signals from 
astronomical sources with such instruments has become more crucial and chal- 
lenging. It is beneficial to follow well-thought-out calibration procedures in accor- 
dance with real source observations since retrofit is often unsatisfactory and costly. 
Most competitive instruments are equipped with a specialized calibration system to 
ensure efficient and reliable calibration. In the visible and near-infrared (VIS-NIR) 
regime where large format 2D detector arrays are normally used, achieving reliable 
flat-fielding is essential, though challenging, especially for wide field observations. 
Recent developments in the field of high-precision spectroscopy for the detection of 
Doppler shift motions from extrasolar planets now require wavelength calibration 
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to a higher precision than that of the 1 m/s level, with a near future goal of 1 cm/s. 
These demanding and ever-changing calibration requirements usually rely on the 
use of cutting-edge calibration sources with properly associated modelling efforts. 

It is always important to obtain calibration data under conditions as close as 
possible to those of real observations of science targets, just after or before the 
observations, if not simultaneously. This is because even subtle differences in envi- 
ronmental conditions and instrumental performance can easily result in ineffective 
calibration. Rigorous tracking of instrumental drifts caused by various factors (e.g., 
flexure) and active compensations for them are very helpful. An efficient calibration 
system should be quick to acquire high signal-to-noise (S/N) ratio calibration signals 
and easy to operate with dependable stability and repeatability. Such calibration 
systems can also be used for laboratory tests of instruments. 

The two primary types of calibration used for conducting astronomical obser- 
vations in the VIS-NIR waveband are flat-fielding and wavelength calibration. A 
typical calibration system for VIS-NIR waveband observations mainly consists of 
an assembly of light sources, wavelength references and a pupil-imaging optical 
system projecting calibration light into instruments. Light sources are a combina- 
tion of continuum sources for flat-fielding and reference line sources for wavelength 
calibration. The former are often used with a diffuser (e.g., integrating sphere) 
to remove angular and spatial dependencies in the calibration light for uniform 
illumination. The coupling between such diffuser and line sources, however, needs 
to be examined with caution by taking into account the reduction of the light 
intensity by the diffuser. The pupil-imaging optical system mimics the telescope 
beam with the same f-ratio to produce the same exit pupil by adopting the output 
port of the diffuser (or light-emitting side of the light bulb in the case of direct 
illumination) as the aperture stop, similar to the illumination of the telescope by 
astronomical sources. Some parts of the calibration system need to be deployable 
to the telescope beam, while others (e.g., reference absorption gas cells) need to 
stay in the telescope optical path with the instruments during the observation. 
The most common calibration requirements are those concerning beam uniformity 
and achromatic intensity variation for flat-fielding; width, density and intensity of 
reference lines for wavelength calibration; as well as pupil center alignment for the 
calibration beam simulating real telescope illumination. For reliable and efficient 
calibration, S/N ratios greater than 100 per pixel and 10 for reference line intensity 
are usually required for flat-fielding and wavelength calibration, respectively. As an 
example, the initial design study of the science calibration system for the adaptive 
optics system and NIR instruments of the Thirty Meter Telescope is available from 
Ref. 1. 


2. Flat-Fielding 


Obtaining a relative responsivity map, or a flat-field, of a detector is a fundamental 
calibration procedure for all observations using 2D detector arrays (e.g., charge 
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coupled devices and infrared focal plane arrays). Flat-fields are often obtained by 
attempting to uniformly illuminate the telescope entrance pupil, as astronomical 
sources do, with an artificial light source. The telescope entrance pupil is usually, 
but not always, the telescope primary mirror, and it is always challenging to produce 
uniform illumination of the entrance pupil, especially for large telescopes and/or 
wide-field observations. This makes a pupil-simulating optical system an alternative 
solution for creating more reliable flat-fields. Other complications involved in flat- 
fielding include chromatic discrepancies between flat-fields and source spectra, which 
is difficult to compensate for, as well as the effects made by scattered and stray 
light. 

Flat-fielding, in principle, removes pixel-to-pixel sensitivity variations caused by 
intrinsic characteristics (e.g., quantum efficiency) of the detector and also by non- 
uniform illumination from the instrument and telescope configuration. A desirable 
flat-field should closely reproduce the incident light of science observations in the 
spatial, angular and spectral distributions across the field-of-view of the instrument. 
Such flat-fielding also provides a diagnostic power of examining instrument perfor- 
mance such as optical system anomalies and irregularity in detector pixel sizes. 
Applying an unsuitable flat-field, in contrast, leads to systematic calibration errors. 
Flat-fielding is of particular importance when precise photometric calibration is 
required. 


2.1. Different types of flat-fields 


There are several ways adopted in astronomical observations for obtaining flat-fields. 
Individual flat-field images should have pixel data numbers significantly greater than 
the photon noise level but still lower than the detector nonlinearity level. Median 
filtering of several individual flat-fields is generally used to increase the S/N ratio 
of the final flat-field, and also to filter out unrepeatable events like cosmic rays and 
detector readout anomalies. Below is a summary of how flat-fields are obtained in 
VIS-NIR observations. 


2.1.1. Dome flat-fields 


The most convenient way of obtaining flat-fields is taking exposures of a screen 
near the telescope projected with flat-field light. Flat-fields obtained in this way are 
called dome flat-fields, and they are taken offline during day time. This gives enough 
time to obtain high S/N ratio flat-fielding data for different types of instrumental 
configuration (e.g., different filters and grating modes) to be used for science obser- 
vations. The dome flat-fields are prone to scattered light from the dome and the 
telescope, as well as brightness gradients that originate from the intrinsic difficulty 
of creating uniform illumination over the entrance pupil. Reference 2 conducted 
extensive analyses of dome flat-fields in comparison with other flat-fields obtained 
with different techniques. 
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2.1.2. Twilight flat-fields 


The twilight available during the short period of time after sunset is another radi- 
ation source that can be effectively used for obtaining flat-fields, providing twilight 
flat-fields.? One practical problem for twilight flat-fields is the short duration of 
twilight, which makes it difficult to obtain high S/N ratio calibration data. Twi- 
light flat-fields are often contaminated by brightness gradients, so the modeling and 
removal of such gradients improves the performance. 


2.1.3. Night-sky flat-fields 


By taking deep-exposure images of the night-sky background, one can obtain night- 
sky flat-fields, which offers another way of carrying out flat-fielding. Similar to 
twilight flat-fields, these flat-fields also tend to have brightness gradients, and it 
is difficult to obtain night-sky flat-fields with a high S/N ratio. One particular 
disadvantage of this method is, unlike the other ones, the flat-fields need to be 
obtained during the night with science observations, which can be costly. 


2.1.4. Calibrated flat-fields 


Another way of obtaining flat-fields is to use a specialized calibration optical sys- 
tem that can mimic the telescope beam. When a light source is coupled with a 
light diffuser (e.g., an integrating sphere), highly uniform radiation can be pro- 
duced to illuminate a rotating pupil mask followed by an imaging system simulat- 
ing the telescope optics. Such a calibrated beam can serve as a flat-field when its 
pupil-illuminating uniform radiation is projected to an instrument. Proper choices 
of the light sources (see below), color-balance filters if necessary, and the details of 
the diffuser (e.g., output port size and internal reflective coatings for the case of 
the integrating sphere) need to be made to provide suitable flat-fields over the spec- 
tral range of interest and field-of-view of the instruments. Compared to the other 
methods mentioned above, this method requires the development of a dedicated 
calibration system that can work with the telescope and instrument configuration. 


2.2. Light sources 


The efficiency of a flat-field is inevitably dependent on the performance of its light 
source. An ideal flat-field light source should be able to produce stable feature-free 
radiation over a broad spectral range without any significant achromatic intensity 
variation. It should also be easy to handle and use with a light diffuser. The most 
commonly used flat-field light source for the VIS-NIR waveband is a quartz tungsten 
halogen (QTH) lamp, usually in the form of a quartz surface which contains rare gas 
and trace amounts of halogen (e.g., iodine) with a doped tungsten filament. Com- 
mercially available QTH lamps typically act as a stabilized black body of 3000 K 
temperature, stable to better than 1% over the course of a night, without appar- 
ent spectral features. Lower temperatures are possible when reduced currents are 
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applied. The QTH lamps are bright and easy to operate, especially with integrating 
spheres. Grey-body light sources at lower temperatures (e.g., 1000 K) can be used 
for flat-fields in the infrared, while deuterium lamps are used for flat-fields in the 
ultraviolet (UV). Light emitting diodes (LEDs) are another type of light source 
that can be considered. In general, light sources with a small dynamic range across 
a broad spectral range are more desirable, and color-balance filters can be used if 
necessary to compensate for achromatic intensity variation. 

A recent noteworthy development, in regard to the light source for flat-fielding, 
is the “supercontinuum sources”. These are based on a pulsed laser broadened by 
photonic crystal fibers, which can cover the broad spectral range of 400-2400 nm. 

The compact configuration and broad spectral coverage make it an attractive 
light source candidate for flat-fielding. One disadvantage of the supercontinuum 
source is that its power output is somewhat strongly dependent on wavelength 
with some undesirable spectral features. Compared to the QTH lamps, its power 
output is low and it is still a very expensive option. However, the technology for the 
supercontinuum source is rapidly developing with potential future applications for 
the broadband flat-field calibration. 


3. Wavelength Calibration 


The wavelength calibration for spectroscopic observations is basically obtaining 
wavelength solutions, which map the detector pixels to wavelengths, using refer- 
ence lines of known wavelengths. This is a fundamental step in all spectroscopic 
observations. The simple, traditional way of conducting wavelength calibration for 
low- and medium-resolution spectroscopy is to conduct polynomial fits of the wave- 
lengths of reference lines as a function of their pixel locations. In order to achieve 
reliable wavelength calibration, the exact wavelengths of the reference lines must be 
known with sufficient accuracy, and a reference line source should provide densely- 
populated stable and narrow lines over a broad spectral range with a small dynamic 
range in their brightness. The reference lines should also be bright with a short lead 
time because most of the wavelength calibration data are obtained during the night 
just before or after science observations. Their pixel positions on the detector need 
to be accurately measured by avoiding significant blending and line broadening. 
Note that recent high-resolution spectroscopy requires wavelength calibration to 
the level of subpixel precision. 

Another consideration to be made for better wavelength calibration is the 
stability of spectrographs, and a spectrograph located inside a controlled environ- 
ment — such as temperature, pressure, humidity and stable light illumination — 
gives superior performance. Such spectrographs are usually fiber-fed and located on 
the floor, separated from telescopic motions, which minimizes the influence of flex- 
ure variation. Active tracking and compensation for spectral drifts, both long- and 
short-term, induced by various factors can also contribute to obtaining more reli- 
able wavelength calibration. As the precise measurements of Doppler-shift motions 
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from Earth-like planets to the level of centimeter/second become a realistic goal 
to achieve in foreseeable future, more advanced wavelength calibration techniques 
based on forefront reference line sources (e.g., laser frequency combs) and refined 
spectral modelling are being actively pursued and developed. Below we provide a list 
of reference line sources that have been adopted and developed for the wavelength 
calibration in the VIS-NIR waveband. 


3.1. Sky telluric lines 


Molecules in Earth’s atmosphere are sources of numerous emission and absorption 
lines that are simultaneously imprinted in the spectra of target astronomical sources. 
There are two types of sky telluric lines. First, the chemical reactions between Oz 
and H in the atmosphere produce a hydroxyl radical that produces ro-vibrational 
OH airglow emission lines. These OH lines are dominant in the H band, while they 
become less prevalent as wavelength increases beyond 2 um. Secondly, the molecules 
(e.g., H2O0, CO2, CH4, Oz) in the atmosphere have several prominent absorption 
bands with densely populated deep, narrow absorption lines, mostly in the NIR 
waveband. While the former are non-thermal, the latter are thermal. Figure 1 shows 
the distributions of the OH emission lines and absorption bands of the four major 
molecules HzO, CO2, CHy4, O2 in the atmosphere in the VIS-NIR waveband. The 
wavelengths and intensities of the OH lines are from Ref. 4, while the information for 
the other molecules is from the high-resolution transmission molecular absorption 
database (HITRAN), database online service (http://hitran.org/). 

These sky telluric lines, which are usually a nuisance that contaminates source 
signal, provide a choice of natural reference lines for wavelength calibration. The 
choice is particularly useful for spectroscopy in the NIR waveband where traditional 
emission line lamps used in the visible waveband underperform (see below). One 
clear advantage of using these sky telluric lines as a wavelength calibrator is that 
they are observed simultaneously with astronomical source signals with uniform 
illumination of the telescope, minimizing the calibration error caused by different 
conditions between the source and calibration spectra. As a result, the telluric OH 
emission lines have been effectively used as a practical choice of reference lines 
for low-resolution NIR spectroscopic observations. For high-resolution spectroscopy, 
however, the OH lines are neither bright nor dense enough to be used as reference 
lines. 

As shown in Fig. 2, the prominent absorption bands of the molecules in Earth’s 
atmosphere provide another natural in situ choice of wavelength calibration for a 
number of selective narrow spectral windows. Because the absorption bands are 
densely populated with sharp, narrow lines from known molecular transitions, their 
spectra, which are recorded together with source spectra, can serve as an effective 
reference. For instance, Ref. 5 used an absorption band of COz molecules in the H 
band to achieve velocity measurements more precise than 10 m/s. An Og absorption 
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Fig. 1. Distributions of the sky telluric lines of OH, HzO, CO2, CHa, and Og (from top to 
bottom) in the VIS-NIR waveband. The line intensities are normalized by the maximum intensity 
of each molecule to show the lines of low intensities more effectively. The data plotted are from 
Ref. 4 (OH lines) and http://hitran.org/. 


band in the visible waveband® and a N2O absorption band around 4.1 pm” yielded 
similar precision. 


However, the sky telluric lines are generally inadequate to serve as a refer- 


ence wavelength standard for high-resolution spectroscopy because their intensities 


and wavelengths are dependent on local conditions such as temperature, pressure, 
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Fig. 2. Distribution of wavelength reference lines from pen-ray lamps and Th-Ar HCLs. 
(a) Normalized intensity distribution of Th—Ar lines in the visible waveband (data from Ref. 14). 
(b) The wavelength distribution of the emission lines from typical pen-ray lamps filled with He, 
Ne, Ar, and Kr in the VIS-NIR waveband. (c) Normalized distribution of Th—Ar lines in the NIR 
waveband (data from Ref. 15). 


wind speed and turbulence. Wavelength variations of these lines that are more than 
10 m/s variations in velocity caused by the changes in the environmental conditions 
are known to exist. Therefore, the use of these sky telluric lines as reference lines 
often relies on sophisticated modelling of Earth’s atmospheric profile and radiative 
line transfer for accurate wavelength calibration.®:° Another shortcoming of the tel- 
luric molecular absorption bands as wavelength references is their limited spectral 
windows. 


3.2. Emission line lamps 


As explained above, despite the advantage of providing reference lines simultane- 
ously with source spectra, sky telluric lines as a wavelength calibration standard 
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have their limits. Below we describe artificial reference line sources that have been 
used in astronomical spectroscopy as a wavelength calibration reference, starting 
from conventional discharge lamps of rare gas to cutting-edge laser frequency combs. 


3.2.1. Pen-ray lamps 


A pen-ray lamp is a pencil-shaped, low-pressure cold cathode discharge lamp filled 
with rare gas (e.g., He, Ne, Ar, Kr, Xe) inside a double-bore quartz tube that 
generates strong, narrow lines from the excitation of the gas. The lines from the 
pen-ray lamps are usually bright and their characteristics are very well known due 
to the in-depth understanding of their excitation and emission process. Pen-ray 
lamps that use a mixture of H, D, or Hg are also available, and they are often able 
to extend the spectral range covered by the lamps. Multiple pen-ray lamps filled 
with different types of gases inside a chamber can be easily assembled with each 
lamp powered by its own power supply. This provides a convenient option to use 
several lamps together for wavelength calibration. Although it is still important to 
uniformly illuminate the slit of a spectrograph with the line emission from pen-ray 
lamps, the intensity of light from such a diffuser is typically only a small fraction 
of the input intensity. The performance of a pen-ray lamp is also dependent on 
the lamp conditions (e.g., the ionization state), so keeping its condition stable is 
important. 

Because of the predominant use of hollow cathode lamps as a light source for 
wavelength calibration in the visible waveband (see below), the pen-ray lamps have 
been used more frequently in the NIR waveband. The combination of lines from 
Ne, Ar and Kr filled pen-ray lamps provides a chain of NIR reference lines with 
comparable intensities suitable for low-resolution spectroscopy. The addition of Xe 
reference lines, which are usually fainter than the lines from other gases, tends to 
give a marginal improvement to the spectral range covered by the other gases. Lists 
of lines from the gases used in the pen-ray lamps are available from Refs. 10 (for 
Ne), 11 (for Ar), 12 (for Kr) and 13 (for Xe). Figure 2(b) shows the wavelength 
distributions of the emission lines from typical pen-ray lamps filled with He, Ne, 
Ar, and Kr in the VIS-NIR waveband. They show irregular distributions with gaps, 
and the density of the lines decreases as the wavelength increases toward the longer 
part of the NIR regime. 


3.2.2. Hollow cathode lamps 


A hollow cathode lamp (HCL) is a glow discharge sealed glass (or quartz) tube in 
which a pair of metallic cathode and anode is integrated with dischargeable gas, 
usually Ar and Ne. This creates a forest of emission lines when the voltage across 
the electrodes is applied properly, typically at a few hundred volts. It is based on the 
“hollow cathode effect”, which is a large increase in the light intensity accompanied 
by a reduction in a voltage drop across the electrodes. A discharge in the lamp 
by the applied voltage results in collision-induced ionization and substantial charge 


236 D.-S. Moon 


motions, leading to a spectrum of emission lines from the gas, excited metal atoms 
and ions. This creates a rich spectrum of emission lines. 


3.2.2.1. Thorium—argon hollow cathode lamps 


The most popular reference line source for wavelength calibration in the visible 
waveband is Th—Ar, i.e., argon filled thorium HCLs. Their spectrum of numerous 
well-understood lines can effectively serve as reference lines, even for high-resolution 
spectroscopy. Th, which is a naturally radioactive element with a very long half-life, 
has a single isotope 7°?Th with zero nuclear spin, creating a relatively clean spectrum 
that is free of blending features caused by isotopic and hyperfine structures. The 
collisions between Ar carrier gas and Th atoms generate a rich spectrum of emission 
lines. Reference 14 reported wavelengths of more than 8400 Th-Ar lines in the 
wavelength range of 370-692 nm; while Ref. 15 provides more than 2400 lines in the 
range of 900-4500nm. Reference 16, on the other hand, reported the identification 
of some new Th lines and improved measurements of previously identified lines in 
the range of 250-5500nm. Figure 2 shows the distributions of the Th—Ar lines in 
the visible waveband (top panel) from Ref. 14 and in the NIR waveband (bottom 
panel) from Ref. 15. The density of the Th—Ar lines is significantly lower in the NIR 
waveband than the visible waveband, and there exist only a few bright lines as the 
wavelength increases. 

Th-Ar HCLs are commercially available and easy to operate, making them a 
preferable choice for wavelength calibration reference, especially in the visible wave- 
band due to the increased number of available lines (Fig. 2). They have been used 
successfully for velocity measurement to the precision level of ~1 m/s when carefully 
adopted with other calibration procedures (Ref. 17, for example). However, there 
are several drawbacks, some of which are of practical importance, in their use as a 
reference line source for wavelength calibration. They are as follows: (1) Overall the 
commercially available Th-Ar HCLs require a high operational current to sustain 
their brightness, at the expense of their life time, which is typically a few hundred 
hours. (2) Although they provide many useful reference lines, there are still cumber- 
some issues of line blending, uneven spectral coverage and irregular spacing between 
the lines, as well as undesirably large dynamic ranges in their line intensities. (3) 
The lines from the carrier charge, Ar, are incomparably brighter than Th lines; 
consequently, they need to be filtered out in the red part of the visible spectrum 
unless observers find a well-balanced exposure time that does not saturate the Ar 
lines while maintaining Th lines at high S/N ratios. The bright Ar lines can be 
a source of unwanted scattered light, and they are also prone to pressure-induced 
wavelength shifts. (4) Their performance is dependent on operating parameters 
(e.g., pressure, current, age) that differ from one lamp to another. (5) The Th—Ar 
HCLs, as wavelength reference sources, are not as efficient in the NIR waveband as 
they are in the visible waveband because of the substantial decrease of the number 
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of bright, isolated lines beyond 1 wm (Fig. 2), which limits their application for the 
longer wavelength range. 


3.2.2.2. Uranium-neon hollow cathode lamps 


Recent studies by Ref. 16 showed that U—Ne (= neon-filled uranium) HCLs can 
be valuable supplements to Th—-Ar HCLs as a wavelength reference source for the 
wavelength range around the H band in the NIR, with an increased number of 
available reference lines. U, mostly in the form of 7°°U, shares many important 
characteristics with ?°?Th as a suitable material of the cathode. Figure 3 compares 
the distributions of Th lines!® and U lines!® in the NIR waveband. Also, compared 
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Fig. 3. Normalized line intensity distributions of Th lines ((a) data from Ref. 18) and U line 
((b) data from Ref. 19). The line intensities of the two elements are normalized by the maximum 
intensity of the corresponding element. 
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to Ar, Ne has a smaller number of saturated bright lines; as a result, the large 
dynamic range problem of the Th—-Ar HCLs caused by the carrier gas is less sig- 
nificant for the U-Ne HCLs. A laboratory experiment of U-Ne HCLs by Ref. 20 
showed that a velocity measurement to the precision level of 10 m/s is achievable. 
The H-band high-resolution (R ~ 22,500) spectrograph for the Apache Point Obser- 
vatory Galactic Evolution Experiment employs both Th—Ar and U—Ne HCLs as its 
wavelength reference source.7! 


3.3. Absorption reference gas cells 


Similar to several of the aforementioned molecular absorption bands in Earth’s 
atmosphere used for precision velocity measurements, artificial absorption molecular 
gas cells located in front of a spectrograph can serve as a wavelength reference. When 
illuminated by the light from an astronomical source or a calibration lamp, such a 
reference cell produces a series of absorption lines depending on the ro-vibrational 
states of the molecular gas inside the cell. As the other efficient wavelength references 
do, the gas cell should produce many densely-populated bright lines that are easily 
identifiable over the spectral range of interest. The intrinsic wavelengths of the lines 
and their gas cell transmission profiles can be measured by a high-resolution spec- 
trograph such as the Fourier transformation spectrograph (FTS) in the laboratory. 
One key property of a cell is the high absorption coefficients of the molecular gas 
that make the cell compact. For a cell filled with molecular gas of low absorption 
coefficients, a long system is often necessary to create a high column density of the 
gas. A gas cell absorption spectrum can be obtained simultaneously with a source 
spectrum, which helps compensate for wavelength drifts between source spectra and 
calibration spectra that are obtained separately under different conditions. 

When stabilized with minimal variation in temperature, pressure and other 
conditions inside the cell, this relatively simple and inexpensive approach based on 
the molecular absorption reference gas cell can provide regular spectrographs with 
high-resolution spectroscopic capabilities for precision velocity measurement. Due 
to the loss of light in the absorption process, however, the intensity of the signal 
from astronomical sources is reduced when using this method. This calibration 
method has been predominantly used for the measurement of Doppler shift motions 
from extrasolar planets. The analyses of data from this method, however, require 
complex modeling processes for wavelength calibration. The observed spectrum, 
Igps(A), is basically a convolution of the product of a Doppler-shifted stellar spec- 
trum, S(A + AA), and the transmission function of the gas cell, Teeu(A), with the 
point spread function (PSF) of the spectrograph: Iops(A) = [Teen (A) x S'(A + AA)] ® 
PSF(A). The spectrograph PSF can be obtained by comparing observed spectra of 
bright featureless stars (e.g., rapidly rotating B-type stars) with the transmission 
function of the gas cell measured in the laboratory. There are several ways of acquir- 
ing the intrinsic stellar spectrum, including direct observations using a spectrograph 
with an extremely high spectral resolving power (e.g., R > 300,000), PSF deconvo- 
lution of an observed source spectrum, and the simple adoption of a model template 
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spectrum. Obtaining a reliable intrinsic spectrum can be quite an onerous task. By 
fitting an observed spectrum with the intrinsic source spectrum as a function of 
wavelength shift, the best-fit Doppler shift can be identified (e.g., Ref. 22). 


3.3.1. Jodine cell for the visible waveband 


In the narrow spectral range of 500-620nm, Ig molecular gas has more than 1000 
prominent transitions that have substantially high absorption coefficients, making 
Iz a preferable choice for the filling gas inside reference cells. Because of the high 
absorption coefficients, Iz gas cells, with lengths of a few cm, can produce a forest 
of strong absorption lines without reducing the intensity of the source signal to an 
impractical level. Velocity measurements at the precision level of a few meter/second 
have been routinely achieved by this method in the visible waveband despite the 
disadvantage of having the narrow spectral window covered by Ip absorption. The 
complex data analyses and modelling processes usually require S/N ratios greater 
than 100 per pixel for the observed source spectra (e.g., Ref. 22). 


3.3.2. Infrared gas cells 


Following the successes of the precision velocity measurements based on Iz reference 
gas cells in the visible waveband, astronomers have developed similar techniques 
applicable to the NIR waveband. The main scientific focus of the NIR applications 
is precision Doppler shift measurements of late-type stars and finding Earth-like 
planets in the habitable zone. Unlike the visible waveband, where Iz gas has been 
adopted as the absorption molecular gas, no single gas has been identified as the 
filling gas in the NIR waveband. In the K band, ‘4NH3, which has many strong lines 
with a small absorption path, has been used as filling gas in achieving ~3 m/s pre- 
cision.” Significant modeling efforts were made in removing the telluric features of 
CH, and H2O in the spectral window of '4NH3. CO gas has been used in the K band 
for the calibration of CO bandhead emission from cool dwarfs, whereas N2O gas has 
been used beyond 2 pzm.?4 Reference 25 used!?CHy, isotopologue gas cell to obtain 
a velocity measurement at a level more precise than 50m/s in the K-band of late- 
type stars. Given the small spectral windows covered by the individual filling gases, 
a combination of reference cells housing different gas molecules can help broaden 
the spectral range. According to Ref. 26, a combination of H'?C,4N,!2C2H», !2CO, 
and!8CO cells may be an effective choice for a reference cell for the H band. 


3.4. Fabry—Perot interferometers and laser combs 


As wavelength calibration at a level much more precise than 1 m/s becomes a fore- 
seeable goal to achieve for the detection of planets around late-type stars (mostly M 
dwarfs), the conventional calibration methods relying on emission lamps, sky telluric 
lines and gas absorption reference cells need to be replaced by more sophisticated 
techniques that can provide a calibration capability of such precision. There are two 
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techniques that have been rapidly developed for this; one is based on Fabry—Perot 
interferometers (FPI); the other is on laser frequency combs. The former produces 
an equidistant series of transmission peaks to be used as reference lines, while the 
latter does a picket fence of laser pulses. 


3.4.1. Fabry-Perot interferometers 


When illuminated with continuum light, a FPI can serve as an emission line filter 
that produces a dense series of equidistantly spaced transmission Airy peaks from 
consecutive constructive interferences satisfying the condition mA = 2nW cos@, 
where m is the interference order, is the wavelength, n is the index of refraction, 
W is the plate width and @ is the incident angle of the light. Its free spectral range 
(i.e., the distance between the peaks) is basically determined by the width of the two 
reflective plates, while the width of the peaks is determined by the reflectance of the 
plates. These rather simple characteristics of FPIs allow them to readily produce 
synthetic wavelength reference lines over a broad spectral range.?” 78 

One important issue of using a FPI as a precision wavelength calibrator is 
the identification of lines from the device, since there is no internal wavelength 
indicator. This generates a significant ambiguity in identifying the wavelengths of 
the lines due to the degeneracy between the interference order and free spectral 
range — the latter is caused by the uncertainty in the width between the plates. 
Hence, wavelength calibration with a FPI requires a secondary wavelength calibrator 
(e.g., Th-Ar and U-Ne HCLs) for absolute wavelength determination. Knowing the 
plate width, which sets the wavelength, with a required accuracy is also challenging. 
For instance, the velocity precision of 1 m/s requires knowing the width at a level 
of ~107°. 

It is also important to maintain the stability of the FPI with unvarying environ- 
mental conditions by minimizing variations in temperature, pressure and mechanical 
stress. A special emphasis should be made for keeping the width between the two 
reflective plates constant, which is essential to minimizing wavelength drifts. Active 
tracking of the variation in the width between the plates followed by proper com- 
pensation can always improve performance. The application of this method to the 
High Accuracy Radial velocity Planet Searcher (HARPS) spectrograph has resulted 
in a significant improvement in the wavelength calibration,?® achieving a nightly 
precision level of 10cm/s when a Zerodur-based spacer was used between the plates 
for enhanced thermal stability. When successfully implemented, a FPI can provide 
a valuable wavelength calibration capability for high-resolution spectroscopy in a 
relatively convenient and simple way. 


3.4.2. Laser frequency combs 


Although they are expensive and cumbersome to operate, laser frequency combs 
(LFCs) are an extremely powerful tool for wavelength calibration to the precision 
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level of a few centimeter/second®” by providing broadband picket fence reference 
lines in frequency. A femtosecond mode-locked laser generates a series of bright, 
discrete ultrashort pulses. The Fourier transformation of these pulses produces an 
optical frequency comb which is a series of dense, equidistant picket pulses that 
form a comb-shaped distribution. The pulses, often called teeth, have the frequency 
distribution fcomb(2) = fo +nfrep, Where the comb teeth frequencies (fcomp) are 
determined by the carrier-envelope offset frequency (fo), the integer mode number 
(n) and the pulse repetition frequency (frep), which is inversely proportional to the 
pulse repetition time interval. Large mode numbers are required for operation in 
the VIS-NIR frequency range because both the offset frequency and the repetition 
frequency are in the radio frequency range. The comb frequencies are in principle 
tunable, and this gives flexibility in adjusting the teeth locations on a detector array. 
Among different types of laser combs, those from Ti:Sapphire lasers have been used 
for work in the visible waveband, while those from Er:fiber lasers have been used in 
the NIR waveband. Combs from both types of lasers can be adopted for broadband 
applications. 

The prospect of using LFCs as a standard reference for precise wavelength 
calibration is based on the stabilization of the reference frequency series by locking 
both the carrier-envelope offset frequency and the laser repetition rate to an atomic 
clock whose long-term drift can be compensated by signals from the global position- 
ing system.3! This method can easily yield absolute precision smaller than 107!°, 
which ensures that the comb teeth are placed at the exact right places of known 
frequencies. Such a stabilized comb can readily generate hundreds of thousands 
of unresolvable optical reference lines of precisely determined wavelengths, provid- 
ing reliable reference lines against which precise wavelength calibration of spectral 
features from astronomical sources can be made in the VIS-NIR waveband. One 
practical advantage of LFCs is that, due to their high brightness (>100nW) per 
combline,*! they can be used with a diffuser (e.g., integrating sphere) to illuminate 
slits uniformly, which helps conduct wavelength calibration more precisely. The large 
loss of light by the diffuser is not a significant problem for LFCs. 

A difficulty of using LFCs as a wavelength reference in the VIS-NIR waveband 
is their innately small frequency spacing. This is because the intrinsic comb teeth 
are too densely populated to be resolved by astronomical spectrographs. In most 
cases, the frequency spacing of LFCs is <1 GHz, with Ti:Sapphire lasers of broader 
spacing than Er:fiber lasers. When coupled with a proper cavity and fiber, 1 GHz 
repetition rate is achievable for Ti:Sapphire lasers.?? However, most astronomical 
spectrographs in the VIS-NIR wavebands are compatible with frequency spacing 
>10 GHz, requiring spectral broadening that can increase the comb teeth frequency 
spacing. Several techniques have been developed for spectral broadening. An inter- 
esting way of doing so is to guide the light from a LFC into a Fabry-Perot cavity 
that selects comb modes with larger frequency spacing. In this configuration, the 
width of the Fabry-Perot cavity is accurately maintained to match its free spec- 


tral range with the frequency mode spacing.**:°4 However, the maintenance of the 
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cavity length to the required precision level, which is often attempted with piezo- 
electrically controlled devices, is challenging. 

Some other laser-based methods developed for precise wavelength calibration 
are intrinsically free of the spectral broadening problem. For instance, combs from 
a continuous-wave pump laser are capable of producing a frequency spacing larger 
than 10 GHz, potentially exceeding 100 GHz, over a broad spectral range.*° Refer- 
ence 36, on the other hand, reported the on-telescope demonstration of the electro- 
optical modulation of a continuous-wave laser as a wavelength reference. In contrast 
to the mode-locked lasers, the continuous-wave lasers have an intrinsically larger 
frequency spacing determined by the modulation frequency. 

Overall, laser comb technology-based wavelength calibration methods have been 
rapidly developed in recent years with great potential for future application in 
precision astronomical spectroscopy in the VIS-NIR waveband. This is because in 
theory they are capable of producing dense, discrete, bright, identifiable reference 
lines across a broad spectral range that can be stabilized by reliable references like 
atomic clocks and the global positioning system. A comprehensive analysis of recent 
developments in the field of precision velocity measurement is available from Ref. 37. 


4. Summary 


Rigorous calibration of obtained data is an essential process of observational astron- 
omy. In fact, scientific interpretations of many important questions that we are 
perusing in astronomy today are critically dependent on the precision in calibration 
and the size of uncertainties in estimated parameters. Following well-thought-out 
calibration procedures always leads to more reliable and precise results. It is impor- 
tant to maintain instruments as stable as possible. Active tracking and compen- 
sation for drifts in instrument performance helps improve calibration. Acquiring 
calibration data simultaneously with, or under conditions as close to, real source 
observations is always helpful for more precise calibration. 

Obtaining a precise flat-field is a fundamental, but always difficult, procedure 
for astronomical observations in the VIS-NIR waveband. Traditional methods of 
dome, twilight, and night-sky flat-fields have both advantages and disadvantages in 
their applications. Flat-fields from a specialized calibration system composed of a 
diffuser-coupled light source system and a telescope-simulating optical system can 
provide uniform pupil-illuminating radiation to an instrument; however, there are 
still issues such as telescope internal scattering that are difficult to compensate for. 
The QTH and LED light sources have been widely used for flat-fields, while there 
have also been developments for light sources with more broad spectral coverage 
such as supercontinuum sources. 

Because of the intense research efforts of finding planets from Doppler-shift mea- 
surements, the techniques for wavelength calibration have dramatically advanced 
in recent years. Detection of Earth-like planets around late-type stars ultimately 
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requires wavelength calibration at the precision level of 1cm/s in the NIR wave- 
band. The precision level achievable in wavelength calibration is heavily dependent 
on the performance of the adopted light sources that generate wavelength reference 
lines. For most astronomical applications other than those for extreme wavelength 
calibration precision for planets, a combination of different types of line sources has 
served as wavelength references, including pen-ray lamps, HCLs and sky telluric 
lines. Among them, Th-Ar HCLs have effectively served as the standard wave- 
length reference in the visible waveband. The situation becomes more difficult as the 
wavelength increases, especially beyond 2 wm, where there are not many lines from 
HCLs, pen-ray lamps or even OH sky telluric emission lines. Both the molecules in 
Earth’s atmosphere (e.g., CO2) and absorption gas cells (e.g., Iz) provide spectral 
windows filled with numerous absorption lines that have been used for precision 
velocity measurements to the level of 1 m/s. One disadvantage of this method is the 
very limited spectral windows offered by the molecules. More precise wavelength 
calibration needs to rely on the use of FPI or LFCs as a reference line source, and 
their applications have been actively investigated. The LFCs can not only provide 
reference lines but their long-term stability can also be maintained to an extreme 
precision by locking to the global positioning system and/or atomic clocks, heralding 
the potential advent of a new era of precision wavelength calibration. 


References 


1. D.-S. Moon, L. Simard, S. Dafna et al., Proc. SPIE 7735, 77355N (2010). 

2. J. L. Marshall and D. L. Depoy, Publ. Astron. Soc. Pacific 932, 1277-1284 (2013). 

3. N. D. Tyson and R. R. Gal, Astron. J. 105, 1206-1212 (1993). 

4. P. Rousselot, C. Lidman, J.-G. Cuby et al., Astron. Astrophys. 354, 1134-1150 (2000). 

5. P. Figueira, F. Pepe, C. F. H. Melo et al., Astron. Astrophys. 515, A55 (2010). 

6. B. Caccin, F. Cavallini, G. Ceppatelli, A. Righini, and A. Terrestrial, Astron. Astro- 
phys. 149, 357-364 (1985). 

7. A. Seifahrt and H. U. Kaufl, Astron. Astrophys. 491, 929-939 (2008). 

8. K. Gullikson, S. Dodson-Robinson, and A. Kraus, Astron. J. 148, 53-58 (2014). 

9. A. Smette, H. Sana, S. Noll et al., Astron. Astrophys. 576, A77 (2015). 

0. C. J. Sansonetti, M. M. Blackwell, and E. B. Saloman, J. Res. Natl. Inst. Stand. 
Technol. 108, 371 (2004). 

11. W. Whaling, W. H. C. Anderson, and M. T. Carle, J. Res. Natl. Inst. Stand. Technol. 

107, 149 (2002). 

12. J. Sansonetti and M. B. Greene, Physica Scripta 75, 5771 (2007). 

13. E. B. Saloman, J. Phys. Chem. Ref. Data 33, 765 (2004). 

14. C. Lovis and E. Pepe, Astron. Astrophys. 468, 1115-1121 (2007). 

15. F. Kerber, G. Nave, and C. J. Sansonetti, Astrophys. J. Suppl. 178, 374-381 (2008). 

16. S. L. Redman, G. G. Ycas, R. Terrien et al., Astrophys. J. Suppl. 199, 1-11 (2012). 

17. M. Mayor, S. Udry, C. Lovis, et al., Astron. Astrophys. 493, 639-644 (2009). 

18. S. L. Redman, G. Nave, and J. Craig, Astrophys. J. Suppl. 211, 4 (2014). 

19. S. L. Redman, J. E. Lawler, G. Nave, L. W. Ramsey, and S. Mahadevan, Astrophys. 
J. Suppl. 195, 24 (2011). 

20. L. W. Ramsey, S. Mahadevan, S. Redman et al., Proc. SPIE 7735, 773571 (2010). 


244 


21. 


22. 


23. 
24. 
25. 
26. 
27. 
28. 
29. 
30. 


3l. 


32. 
33. 
34. 
35. 
36. 
37. 


D.-S. Moon 


D. L. Nidever, J. A. Holtzman, C. Allende Prieto et al., Astron. J. 150, 173-194 
(2015). 

R. P. Butler, G. W. Marcy, E. Williams et al., Publ. Astron. Soc. Pacific 108, 500-509 
(1996). 

J. L. Bean, A. Seifahrt, H. Hartman et al., Astrophys. J. 731, 410-422 (2010). 

U. Seemann, G. Anglada-Escude, D. Baade et al., Proc. SPIE 9147, 91475G (2014). 
J. Gagne et al., Astrophys. J. 822, 40 (2016). 

S. Mahadevan and J. Ge, Astrophys. J. 692, 1590-1596 (2009). 

F. Wildi, F. Pepe, C. Lovis et al., Proc. SPIE 7440, 74400M (2009). 

A. Reiners, R. K. Banyal, and R. G. Ulbrich, Astron. Astrophys. 569, A77 (2014). 
F. Wildi, F. Pepe, B. Chazelas et al., Proc. SPIE 8151, 81511F (2011). 

M. T. Murphy, Th. Udem, R. Holzwarth et al., Mon. Not. R. Astron. Soc. 380, 839-847 
(2007). 

F. Quinlan, G. Ycas, S. Osterman, and S. A. Diddams, Rev. Sci. Instrum. 81, 063105 
(2010). 

A. Bartels, D. Heinecke, and S. A. Diddams, Science 326, 681 (2009). 

D. A. Braje, M. S. Kirchner, S. Osterman, et al., Eur. Phys. J. D 48, 57-66 (2008). 
T. Steinmetz, T. Wilken, C. Araujo-Hauck et al., Appl. Phys. B 96, 251-256 (2009). 
P. Del’Haye, A. Schliesser, O. Arcizet et al., Nature 450, 1214-1217 (2007). 

X. Yi, K. Vahala, J. Li et al., Nat. Commun. 7, 1-9 (2016). 

D. Fischer et al., Publ. Astron. Soc. Pacific 964, 066001 (2016). 


Part 9 


Adaptive Optics 


This page intentionally left blank 


Chapter 13 


Adaptive Optics: Single-Conjugate 
Natural Guide Star AO 


Donald Gavel 


Center for Adaptive Optics, University of California Observatories, 
UC Santa Cruz, Santa Cruz, CA 95060, USA 


In this chapter, we describe how the Earth’s atmosphere distorts light waves and 
how that affects the imaging performance of ground based telescopes. Adaptive 
optics (AO) systems are introduced as the technique for recovering diffraction- 
limited resolution of large telescopes. We cover the essential elements of a basic, 
single-conjugate, adaptive optic system. The single-conjugate adaptive optics sys- 
tem will correct the wavefront integrated along a single path through the turbulent 
atmosphere, so its correction is valid only over a limited field of view, called the 
isoplanatic angle. We explain how, given atmospheric conditions and the parame- 
ters of the adaptive optic system design, one can predict on-axis performance and 
estimate the isoplanatic angle. Adaptive optics systems need a bright reference 
point source to probe the distortions of the atmosphere. A natural star can be 
used for this purpose. We investigate how bright this star must be for a given 
level of image correction and, given the natural occurrence of bright stars in the 
sky and knowledge of the isoplanatic angle, we can calculate “sky coverage”, i.e., 
the fraction of the sky that can be usefully observed by single-conjugate natural 
guide star adaptive optics systems. 


1. Introduction 


Adaptive optics (AO) systems reduce the blurring effects caused by the Earth’s 
atmospheric turbulence in ground-based astronomical imaging and spectroscopy. 
The advantages to using AO systems include much more highly resolved images that 
help to distinguish objects in crowded object fields, the ability to detect faint objects 
beneath a sky glow background (such as OH lines or thermal emission in the infrared 
bands), finer precision astrometry, and increased spectral resolution and contrast. 
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As well as improving resolution, AO will tend to provide higher contrast images, 
since point-like objects concentrate in an Airy core rather than blurring out over the 
field; however, for extremely high contrast needs the AO system should be combined 
with a coronagraph, as described in Part 5 of this volume. AO observing from the 
ground with a large aperture telescope provides higher resolution than space-based 
telescopes, with ground-based resolution no longer limited by seeing. In practice, 
AO on 8-10 m class telescopes like Keck, Gemini, Subaru, and VLT provides science 
complementarity to Hubble, Spitzer, and other space-based observatories. 

AO systems are composed of active components in the optical beam path 
between telescope and science instrument. Often the instrument itself must be 
specifically designed to accept the AO beam, since the point spread function (PSF) 
is considerably different than that of a seeing-limited beam. AO systems come in a 
variety of types that depend on scientific goals. These will be delineated in subse- 
quent chapters. Typical AO system varieties include: 


e Diffraction-limited correction on a narrow field, typically for near-infrared imag- 
ing and spectroscopy (single-conjugate AQ*). 

e Diffraction-limited correction on an intermediate-size field for near-infrared imag- 
ing of extended objects (multi-conjugate AO). 

e Partial correction on a wide field, feeding seeing-limited instruments (ground-layer 
AO>). 

e Diffraction-limited correction on multiple narrow fields, typically for multiple 
galaxy field spectroscopy (multi-object AO‘). 

e Extremely precise correction on a very narrow field, combined with a coronagraph 
to suppress scattered light from diffraction, typically for characterizing exoplanets 
(extreme AQ®). 


An AO system’s upper bound on performance is set by optical physics: the 
diffraction-limit of the telescope. An AO system will also correct for some of the 
aberrations of the telescope and the optics within the instrument. Residual wave- 
front errors after AO correction degrade the imaging performance from ideal, so the 
design process consists of deciding what is “good enough” for the science objectives 
and then allocating an error budget among wavefront error contributors, a topic 
which we shall cover in this chapter. 

This chapter will focus on adaptive optics systems that use a natural star as 
a wavefront probe. The advantage of using natural guide stars is that it allows for 
a simpler design and lower cost adaptive optics system. The disadvantage, as will 
become apparent in the later part of this chapter, is that sufficiently bright natural 


*This chapter and Chapter 14. 
>See Chapter 16 
“See Chapter 17. 
See Chapter 18. 
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guide stars are infrequent on the sky and so limit the fraction of the sky that 
can be addressed. Later chapters cover the use of artificial guide stars, generated 
using lasers, which alleviates this problem. Still, the construction of a natural guide 
star AO system is valuable for astronomy for particular science cases, such as the 
detection and imaging of planets, companions, and debris disks around the stars 
nearest our Sun, which are bright enough to run the AO system. 


2. Aberrations 


An ideal optical system, say a perfect telescope in a vacuum, will bring all the light 
rays from a plane wave to a focus, and the optical path distance along each ray from 
every point in the aperture to the focal point will all be equal. Aberrations are the 
departures in these optical path distances from their ideal, and these aberrations 
can be caused by imperfections in the optics or by turbulence atmosphere. One can 
see that for coherent addition along all the optical paths, the constraint on these 
deviations must be very small: the variations of path distance must be confined to 
less than one-half the light’s wavelength in order for the beam to form a diffraction- 
limited image. The optical path distance is related to the phase of the wave through 
the factor 27/A, where is the wavelength. We must have 


|o(u) — 9(u’)| <7, (1) 


where ¢(u) is the phase of the ray that came from position u in the aperture, and u 
and w’ are any two points on the aperture. If this condition is met, then the focused 
light is concentrated at the focal plane in an area of diameter 


|Az| < 2.44f\/D => |A0| < 2.44\/D, (2) 


where f is the focal length of the telescope and D is the aperture diameter. For 
the second formula we have used the fact that at the focal plane the plate scale 
is 1/f radians per meter to relate it to resolution on the sky. Table 1 shows typi- 
cal seeing and diffraction-limited point source image sizes at various observatories 


Table 1. 
Diffraction 
Site Seeing (arcsec) Telescope limit (arcsec) 
Mauna Kea, Hawaii 0.3 Keck D = 10m 0.01 
Mauna Kea, Hawaii 0.3 Gemini North D = 8m 0.013 
Mauna Kea, Hawaii 0.3 CFHT D = 3.6m 0.029 
Cerro Pachon, Chile 0.4 Gemini South D = 8m 0.013 
Cerro Tololo, Chile 0.4 VLT-UT4 D=8m 0.013 
Mt. Palomar, California 0.7 Hale D=5m 0.02 
Mt. Hamilton, California 0.7 Shane D = 3m 0.034 


Hubble Space Telescope — D=2.6m 0.04 
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having AO systems, and for comparison the diffraction-limit of HST, the largest 
space-based telescope. 


3. Atmospheric Turbulence 


Atmospheric turbulence is the primary cause of aberration in ground-based astro- 
nomical observing. The atmosphere affects the starlight via multiple refractive devi- 
ations of the light rays along the multi-kilometer long path of the light through the 
atmosphere. The optical path differences have significant variation across the tele- 
scope aperture, resulting in the originally coherent light waves getting out of phase, 
violating Eq. (1). A ray traversing the entire atmosphere to the telescope aperture 
has an optical path distance (OPD) given by 


L 
opp(u) = [ N(Uz, Uy, Z)dz, (3) 


where we begin the ray at some arbitrary height DL well above the atmosphere, 
z = 0 is the location of the telescope aperture and u = (uz, uy) is the ray location 
on the aperture at z = 0. The index of refraction, n, is a function of atmospheric 
constituents, density, and temperature, so it depends on all three spatial variables. 
However, for the purposes of imaging, we are only interested in path length varia- 
tions that occur across the plane of the aperture in the (wz, uy) direction and not 
in the total integrated path length over z. These variations are dominated by the 
local turbulence at each altitude layer. 

Turbulence arises for a variety of reasons. At the ground, prevailing winds inter- 
act with ground topography to form large eddies that then cascade into smaller 
eddies, turning a laminar flow into a turbulent one. At upper altitudes there tend 
to be layers where winds blow at differing velocities, resulting in turbulent instabil- 
ities at the wind-shear boundaries (Fig. 1). The OPD across a large astronomical 


stratosphere 
————— 
tropopause 
10-12 km 
— wind flow around dome 
boundary layer 
|- 1km SS 


Heat sourcss within dome 


Fig. 1. The air turbulence arises from a number of geophysical sources. The observatory environ- 
ment can also be a significant source of local turbulence. 
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telescope aperture (D = 8m) is on the order of 4m rms, inhibiting diffraction- 
limited imaging in the optical and near-infrared science bands. At optical wave- 
lengths (A + 0.5m), telescopes with apertures larger than about 30cm will be 
seeing-limited. 

The theory of random turbulence and the resulting refractive index fluctuations 
have been developed in the late 20th century by Refs. 1-3. The perturbations are 
random, and have statistical correlation properties over the volume. Furthermore, 
on time scales of interest to optical imaging, the particular realization of random 
turbulence is effectively static, with dynamics described almost completely by simple 
bulk translation due to wind. The frozen-flow dynamics model was expressed and 
validated experimentally in 1938 by Ref. 4, with later studies measuring the time 
scales of departure from frozen flow.° 

The magnitude of the refractive index variations increases with the strength of 
the turbulence. Kolmogorov reasoned that there is an energy balance across scales 
of turbulence; that is, energy injected into turbulent eddies at large scales flows 
into eddies of successively smaller scales without loss. The energy flows through 
a cascade of scales starting at the outer scale where bulk wind and topographical 
features introduce turbulent energy and ending at the inner scale where gas viscosity 
dissipates energy into heat. The mathematical derivation from first principles can 
be found in Ref. 1 and is well summarized in Ref. 6. 

The result is that the index variations have a spatial power spectrum of the 
form 


®,,(«) = 0.033C2(h)x-1/3, (4) 


where & = 27/I is a spatial wavenumber, with | being the wavelength, of a sinusoidal 
component of the variation. The applicable distance range is Lp > 1 > lo where Lo 
is the outer scale, typically tens of meters, and Jp is the inner scale, typically on 
the order of a few millimeters. C2(h) is a factor that quantifies the strength of the 
turbulence, which, on the scale of telescope apertures, can be considered a function 
of altitude h only. Generally, C? 


* is large near the ground, where the ground winds 


interact turbulently with the topography, and at layers in the upper atmosphere 
where wind-shears develop turbulent eddies. At almost all ground-based observing 
sites, the ground turbulence (C?(0)) is by far the strongest contributor in the vertical 
distribution of turbulence. This can explain why the best seeing sites on Earth are 
located where ground-layer winds are nearly laminar (non-tubulent) flows off the 
ocean, examples being the west coast of the US, Chile, the Canary Islands, and 
Hawaii. 

A brief note here on the inner and outer scales. The inner scale is the scale at 
which energy is dissipated into heat due to the viscosity of the air. This scale is only a 
few millimeters and the amplitude of the index variations at this scale can generally 
be neglected, i.e., the vertically integrated index fluctuations at this level add up 
to only a small fraction of a visible light wavelength and thus do not significantly 
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degrade imaging qualiity. However, if one were attempting high-resolution imaging 
along a long horizontal path, for example of far away terrestrial objects, (or astro- 
nomical observing at the horizon!), the inner scale effects can contribute a significant 
amount to light ray scattering. 

The outer scale becomes important in the case of very large telescope aper- 
tures such as the 30+meter telescopes currently planned, and for long-baseline 
optical interferometry. Generally the shape of the power spectrum flattens beyond 
the outer scale, meaning there is less of an effect than would be predicted using 
the Kolmogorov spectrum (Eq. (4)), which grows infinite as « + oo. For the Keck 
10-m telescope on Mauna Kea this was an important consideration in setting the 
requirements for the tip/tilt system and studies were undertaken to measure the 
outer scale.” 

We can relate the spatial power spectrum ®(«) to the spatial correlation func- 
tion C,,(r), by definition, via a Fourier transform 


Cr(r) = J enwyertatn, (5) 


®,(K) = [enter nrer (6) 


however, the covariance function in this definition is dominated by the power spec- 
trum’s large values near & = 0, i.e., it is dominated by the outer scale. To capture 
the mid-scale behavior in a statistical correlation metric, researchers introduced 
a different type of spatial function called the structure function, defined as the 
variance of differences of index as a function of separation: 


Da(r) = ({n(z) — n(x +r)/’) (7) 


With this we can write a transform pair 
D(r) = 2 ‘| &(«)[1 — cos(« - r)]d?x, (9) 


®, (Kk) = —(1/2) [ Da(re mtr (10) 
The structure function 
Dn(r) = C2(h)r?/3 (11) 


is valid for distances between the inner and outer scales, lp < r < Lo, where h is 


altitude and r is the distance between points. We see that, through the Fourier trans- 


2/3 


form relationship, the k~!!/3 power law for the spectrum maps to a r?/° power law 
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for the structure function. The proportionality factor, C?, is called the structure 
“constant” (it is not really a constant because it depends on altitude, but it is 
effectively a constant on distances smaller than the outer scale) and it characterizes 


the strength of the turbulence. 


4. Effect on Astronomical Imaging 


Atmospheric aberration results in a blurring of the image because effectively the 
atmosphere acts like a number of small prisms distributed over the aperture of the 
telescope. Within each of these patches, Eq. (1) is met, so each forms a PSF of 
size /d, where d is a coherence diameter that is the distance on the aberrated 
phase plane over which the phase is within +7/2 of any other phase in the patch. 
Typically, short exposure images in narrow band light look speckled, because light 
from the many patches of aperture interfere either constructively or destructively. 


Over longer exposures the prisms randomly tilt so the speckled image is smeared 
out, the final PSF having a size of a bit larger than A/d. 

To quantify the effect, we consider the wavefront surfaces as they arrive at the 
telescope aperture. The wavefront surface is rough because the atmosphere changes 
the directions of individual rays, and the wavefront by definition is the surface 
perpendicular to these rays. Now consider a plane that cuts through the wavefront 
surfaces (Fig. 2). Every point on this plane touches one of the wavefront surfaces, so 
we assign phase values on the plane equal to the phase of the wavefront it touches. 
We can assign these phase planes to be either parallel to the aperture of the receiving 
telescope or perpendicular to the line from the center of the aperture to any of the 
science objects or guide stars. These planes differ only in their overall tilt. 


aberrated wavefront surface 


intersecting plane 


light rays 


telescope aperture 
(with wavefront phase contours) 


Fig. 2. The atmosphere-aberrated wavefront as it approaches the telescope aperture. A wavefront 
is a surface of constant phase. The plane of the telescope aperture slices through wavefront surfaces. 
¢(u) is the wavefront phase at point u in the aperture plane. 
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The image intensity at the focal plane is° 


PSF(0) = | Fa+6 { auc] (12) 


2 
= | feeeeramesmat 


where 


o(u) = (=) opp(n) (14) 


is the integrated optical path distance, converted to units of phase, and A(u) is 
the aperture transmission function, normalized so that the integral of the PSF is 
1 (achieved with [ A(u)?d?u = 1). Typically, A(u) is uniform inside the aperture 
and 0 outside, but the aperture function could be more complicated, for example, 
to account for vignetting, variations in reflectivity over the telescope mirror, or the 
apodization that is used in some coronagraphs. 

Reference 8 derived expressions for the average atmospherically aberrated PSF 
and its Fourier transform, the modulation transfer function (MTF). The MTF has 
two factors, the atmospheric part, which involves only the phase structure function, 
and the telescope part, which depends on the shape of the aperture. Typically, tele- 
scope apertures are not perfect circular disks; they are complicated by obscurations 
of the secondary mirror, occlusions due to the struts that hold the secondary mirror, 
and non-circular boundaries. Thus, to analyze the exact nature of the diffraction, 
one needs to simulate the particular aperture involved. However, since seeing typi- 
cally dominates the large aperture PSF, there is not much dependence of the PSF 
on the aperture itself. In that case the PSF could be reasonably well modeled by 
the atmospheric factor alone, ignoring the telescope factor. To get the exact PSF, 
however, both factors are required. This can be important in some applications. For 
example secondary spiders (and other small features in the aperture) can scatter 
some amount of light to large angles, affecting the achievable contrast near bright 
stars. 

The average seeing MTF and PSF are given by 


MTF(u) = r(u)e7??9™ (15) 
PSF(0) = Fu.o{MTF(u)}, (16) 

where t(u) = { A(u’)A(u — u’)d?u’ is the MTF of the telescope and 
Dg(u) = 6.88(r/179)°’?. (17) 


Here, r = ||ul| is the structure function of the atmospherically aberrated wavefront 
@(u) at the telescope aperture plane. The 2D structure function of phase Dy(u) is 


“Here, the notation Fy _,g denotes a Fourier transform from u to @. 
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derived from the three-dimensional structure function of index D,,(r) with the help 
of Eqs. (11), (3), and (14). 


The parameter 


L —3/5 
ro= psaascow? [ Chto : k= 2n/A (18) 
0 


is the well-known Fried seeing parameter. w~ is the angle with respect to zenith of 
the line from the telescope to the science object, thus, sec(w) is an airmass factor 
that accounts for longer paths through atmosphere when the telescope is pointed 
off of zenith. Fried’s parameter corresponds to the spatial coherence diameter we 
spoke of earlier, essentially the average size of patches in the aperture plane over 
which the phase variations are within +7/2. 

Simulated atmosphere PSFs are shown in Fig. 3. On the left is the diffraction- 
limited PSF for a circular aperture and no phase aberration, ¢(u) = 0. The middle 
PSF is for a short exposure through the atmosphere, which shows a speckle pattern 


due to interferences from various areas in the aberrated wavefront. The right panel 
shows what an astronomer typically sees, the blurred-out PSF after a long exposure. 

A very useful metric for the optical system performance is the Strehl ratio, 
which is the ratio of the on-axis peak of the aberrated PSF relative to the on-axis 
peak of the unaberrated PSF: 


S = PSF4(0)/PSFo(0). (19) 


Strehl ratio is a value always between 0 and 1, with 1 being the best possible to 
achieve for a given aperture. It is possible to show that for small random aberrations 


Swe, (20) 


long exposure 
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Fig. 3. (a) Diffraction-limited image of a point-source object obtained through a 3-m telescope in 
A = 0.5 wm light. (b) Short exposure image through atmosphere and telescope, with the atmosphere 
at 1 arcsecond seeing conditions. (b) Long exposure image through the atmosphere and telescope, 
showing how the speckle structure smears out over time. 
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< 1 radian, where a3 


The approximation is valid for 7g S$ represents the phase 
variance around the aperture mean. Using Eq. (17), Ref. 9 calculated a3 = 
1.0299(D/ro)°/* for the Kolmogorov atmosphere. So with D = ro, the Strehl ratio 
is e—' = 0.36. 

In typical seeing conditions at the best sites, ro is in the range of 10—-20cm, 
much smaller than the apertures of the largest telescopes, leading to phase variance 
too large for Eq. (20) to be valid. In that case the Strehl ratio, now the ratio of the 
peak of the seeing-limited PSF to the peak of the diffraction-limited PSF, is better 


approximated by 


sx Wi - (GE (21) 


For example the seeing-limited Strehl ratio on a D = 10-m telescope with ro = 20 cm 
seeing is approximately 4 x 10~+, which agrees reasonably well with experiment. 
This value is much larger than would be predicted with Eq. (20), but is still quite 
small compared to what adaptive optics can achieve. 

Strehl ratio is key to imaging performance in conditions where there is a diffuse 
background due to nebulosity, sky emission, or overcrowding of stars. Higher Strehl 
ratio means higher signal above the background and consequently shorter exposure 
times necessary to achieve a certain signal-to-noise in images. The Strehl metric is 
also practical for engineering purposes. The sum of squared residuals from error- 
budget contributors can be entered into Eq. (20) (assuming the AO system produces 
a small residual phase) and the total Strehl estimate quickly calculated. 

A second key metric for imaging performance is the full-width-half-max 
(FWHM) of the point spread function. For a diffraction-limited Airy pattern, the 
FWHM is approximately \/D (with slight variation depending on aperture obscu- 
rations and shape details). For a seeing-limited long-exposure PSF, the FWHM 
is approximately A/7ro. So adaptive optics potentially achieves a factor of D/ro 
improvement in FWHM. The FWHM determines the resolution of the images 
according to the well-known Rayleigh criterion and other metrics of resolution 
(see Ref. 10 for an analysis taking into account image signal-to-noise ratio). The 
FWHM is a rather abrupt function of Strehl, switching quickly from seeing-limited 
to diffraction-limited as the wavefront error variance crosses to smaller than 7/2. 
Requiring the AO system to be diffraction limited at a given wavelength will set a 
hard limit on the allowable wavefront error. 


5. AO System Design and Sources of Wavefront Correction Error 


5.1. Design considerations 


AO systems consist of active optical elements that correct the wavefront in real time 
on its path to the science instrument. The science instrument, typically an imager 
or spectrograph, is placed behind the AO system and accepts the corrected beam. 


Natural Guide Star AO 257 


Since the corrected beam can be much different than the seeing-limited beam, the 
science instrument is often specifically designed for use with adaptive optics. For 
example, a spectrograph slit can be made smaller, and for a given spectral resolu- 
tion the size of the dispersion and camera optics can be made much smaller, since 
these are directly proportional to the seeing-size.'1 An imager might use a longer 
f-number camera or smaller pixels in the focal plane detector to take advantage of 
the higher resolution. This may then require a larger detector, i.e., one with more 
pixels, or reduced field of view. 


5.2. Wavefront spatial sampling 


One of the choices facing the AO system designer is the number of phase points to 
control across the aperture. Wavefront control mirrors can be composed of a number 
of discrete segments, each with piston, tip, and tilt control to fit the wavefront in 
a discrete piecewise manner, or alternatively (and more commonly) a continuous 
reflective surface that bends in response to actuators, also fitting the wavefront 
but with continuous response functions. Continuous face-sheet DMs are designed 
specifically so that actuator influence functions interpolate smoothly between the 
actuators.! The spacing of actuators determines the maximum spatial frequencies 
available to the wavefront controller. So, for example, if the spacing is d (mapped 
to the u space of the telescope aperture) then the highest spatial frequency of the 
control is « = 7/d. One must sample the wavefront on the order of one degree-of- 
freedom (actuator) per wavefront coherence length in order to bring the wavefront 
into phase across the aperture. A way of quantifying this is through the residual 
wavefront error left after the correction ¢eprr(u) = ¢(u) — dpm(u), whose variance 
is given by ? 


Ofer = u(d/ro)?/?, (22) 


where pp = 0.113 for segmented piston-tip-tilt DMs for which d is the separation of 
segments, and j= 0.3 for continuous face-sheet DMs for which d is the separation 
of push-pull type actuators behind the face-sheet. 


5.3. Temporal sampling and controller design 


A second choice facing the AO system designer is the rate at which to provide new 
corrections, i.e., the bandwidth of the controller. The basic rule here is that the 
update rate must be faster than the rate at which (under the frozen-flow hypothe- 
sis) the wind blows turbulent structures through a distance of order the coherence 
length. AO feedback control architectures are shown schematically in Fig. 4. 
Figure 4(a) shows the configuration for a closed-loop control architecture. The 
incoming aberrated beam first reflects off the wavefront correction elements, in this 


fThe technologies used for the actuators are covered in Part 6 of this Volume. 
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Fig. 4. Adaptive optics control architectures, (a) closed-loop control, and (b) open-loop control. 


drawing shown as a single deformable mirror. After correction, the light is split into 
two paths, one for the wavefront sensor and one for the science detector. It can 
be convenient for light to be split according to color. The first AO systems split 
at around 1 micron, with shorter wavelengths going to the CCD-based wavefront 
sensor and longer wavelengths going to an infrared science detector. A guide-star 
pick-off is another possible option, where a small mirror (or hole in a mirror) is used 
to direct the guide star beam to the wavefront sensor while the rest of the field goes 
on the science detector. 

In a closed-loop feedback control architecture the wavefront sensor measures 
the residual wavefront after correction. It is the job of the controller to make incre- 
mental adjustments to the deformable mirror whenever the wavefront sensor shows 
a non-zero reading. The phase correction applied to the deformable mirror, épm, 
is then 


épm(k + 1) = yépm(k) + gowes(k), (23) 


where @wrsg is the residual phase measured by the wavefront sensor, k is a time 
iteration step and g is the feedback gain. The factor y is a forgetting factor that 
allows the DM to relax to a quiescent position in case of no signal, which aids in 
keeping the feedback system stable. Typically, y < 1. The feedback gain is adjusted 
to minimize residual error, which is a tradeoff between responding to a dynamically 
changing atmospheric wavefront and signal-to-noise in the wavefront sensing. 
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The second architecture, shown in Fig. 4‘b), is open-loop control, meaning that 
the raw incoming wavefront is measured, and a phase correction based on this 
measurement is sent to the deformable mirror 


épm(k + 1) _ éwrs(k). (24) 


The majority of single-conjugate natural guide star AO systems use closed-loop 
control because (1) the wavefront sensing process operates around a small linear 
range near zero aberration, and (2) the controller has a chance to correct for errors 
that may be due to nonlinearities or misalignment of the deformable mirror. How- 
ever, there are some applications where open-loop can have advantages, such as in 
a multi-object AO system.® With open loop control, the deformable mirror must be 
capable of going to exactly the commanded shape, since the wavefront sensor will 
not be able to measure and correct it. For the remainder of this chapter, we assume 
closed-loop control. 

Controller performance will depend on the temporal power spectrum of the 
turbulence and the noise in the wavefront sensor. Assuming turbulence dynamics 
is frozen-flow at each layer, the temporal power spectrum at that layer can be 
deduced from the Kolmogorov spatial power spectrum. Then, assuming each layer 
is statistically independent, we sum the power spectra of each layer to get the total 


integrated path power spectrum, !? 


L 
S(f) = 0.0326 seo(u)n® | C2 (h)u(h)>/3dh f-8/3, (25) 
0 


where u(h) is the wind speed at altitude h and, again, q is the zenith angle of 
the direction of pointing. Note that the power spectrum has a steep fall-off with 
temporal frequency, which implies that there is much to be gained by even a low 
bandwidth controller. 

Greenwood!* defined a characteristic frequency, fj, which is the minimum 
bandwidth of a perfect control law that would be needed to achieve 1 radian squared 
of residual phase variance 


3/5 


L 
fo = roses | C2(h)u(h)>/3dh}| (26) 
0 


where wind v and turbulence CO? are distributed in altitude. A simple formula follows 
if all the wind and turbulence is concentrated at one altitude (the ground, say): then 
fy = 0.158v/ro. From this we see that the required control bandwidth is comparable 
to the rate at which wind clears a coherence length. 

The actual rms residual attained depends on details of the control law, as 
well as the spectrum of wavefront phase and the measurement noise. Let ¢¢(k) = 
o(k) — dpm(k) (all with the aperture average removed) be the residual phase after 


See Chapter 17. 
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correction. The phase measured by the wavefront sensor is dwrs(k) = ¢e(k) + n(k), 
where n(k) is the measurement noise. The temporal power spectrum of the residual 
is then 


Se(f) = |He(f)P So(f) + |An(F)? NF), (27) 
where 
1 gC(f) 
A.(f) = T+9C) and 4A,(f)= T+9C(h) (28) 


are closed-loop transfer functions from wavefront to residual and noise to residual, 
respectively, each a function of temporal frequency, f. N(f) is the power spectrum 
of the measurement noise, which is a constant function of frequency, i.e., white 
noise, because measurement noise is uncorrelated in successive measurements. C'(f) 
is the “around the loop” transfer function incorporating the temporal delays and 
frequency responses in the wavefront sensing, control computation, control compen- 
sator, and deformable mirror dynamics. Typical transfer functions H.(f) and H,,(f) 
for a simple control law (Eq. (23)) are depicted in Fig. 5(a). Details of control loop 
modeling, specifically the models for C(f), can be found in Ref. 14. The wavefront 
error variance is the integral of the power spectrum of the residual, 


o2w = / Se(f df. (29) 


The residual variance can be optimized (minimized) with a choice of feedback 
gain and control compensation function, understanding that there is a tradeoff 
between competing effects in the optimization. As the gain is increased, more of the 
wavefront aberration is suppressed; however, this increases the response to noise. 
There is also the issue of closed-loop stability. There is some frequency at which 
there is enough delay around the loop that arg{C(f)} < —180°. One must be sure 
that |gC(f)| < 1 at that point, otherwise there is positive feedback and the loop 
goes unstable. Thus, there is an upper limit to how large g can be. The quantity 
1/gC(f_180) is known as the gain margin, the factor by which gain can be increased 
before the system goes unstable. Figure 5(b) shows examples of Sy(f) and S.(f) 
and Fig. 5(c) shows the control residual, ogw, as a function of control gain for 
various wavefront sensor noise cases, showing that there is a different optimal gain 
depending on the noise level. 


5.4. Wavefront measurement error 


Wavefront sensing methods and technology are discussed in Chapter 11 of Volume 
2 of this Handbook. Accuracy of any wavefront sensor is ultimately limited by SNR 
in its detection of the guide star light. We present here a SNR analysis of the Shack— 
Hartmann sensor, which is probably the most common type of wavefront sensor used 
in present day astronomical AO systems. This sensor measures the local slopes of the 
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Fig. 5. (a) Closed-loop transfer functions for phase Hy(f) and noise Hn(f). (b) Phase and phase 
residual power spectra with g = 0.4, y = 0.99. (c) Root-mean-square residual opw versus feedback 
gain, for various measurement noise cases, showing that the optimal gain setting is a function of 
wavefront sensor signal-to-noise. 


wavefront over a regular grid. The grid sampling often is chosen to match the pattern 
of actuators on the wavefront corrector mirror, though this is not strictly necessary. 
If the sample spacings do not match, then the parameter d in the calculation of 
spatial wavefront fitting error (Eq. (22)) must be the larger of the DM actuator 
spacing and the wavefront sensor sample spacing. 

Given measurements of the wavefront slopes, the wavefront phase is recon- 
structed using a computer so as to satisfy the condition 


V(x) = s(x), (30) 


where s(x) is the slope at x and ¢(x) is the wavefront phase. x is from a finite 
set of discrete sample points {x;, i = 1,2,...}, which are the spatial locations of 
Shack—Hartmann sub-apertures. Phase reconstruction algorithms solve a discretized 
version of Eq. (30) that can be expressed as a set of linear equations: s = Gd, 
where s = [s(x;)] and @ = [@(x;)] (¢ = 1,2,...) are column vectors of slope and 
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phase samples, respectively, and G is a matrix finite difference operator approxi- 
mating the spatial gradient. Variations of reconstructors include versions where s 
and ¢ are expressed in different mode spaces, such as Fourier series or Zernike basis 
functions. The different mode spaces may offer computational speed advantages, 
or added flexibility in the control law, or a way to constrain wavefront fitting to a 
subset of a specific mode space.'*»!® With the use of pseudo-inverse techniques, the 
reconstruction algorithm can be designed to produce minimum variance solutions, 
which helps reduce the deleterious effect of wavefront measurement noise in the 
wavefront control.!7 

We want to quantify how noise in the wavefront sensing enters into error in 
the wavefront reconstruction. Such a model is complicated in that it depends on 
details of the wavefront sensor optics, Hartmann spot centroiding algorithm, CCD 
pixel size, reconstructor algorithm, and the seeing conditions. Based on the work of 
Ref. 18 and others, the phase measurement error variance from a Shack—Hartmann 
sensor can be expressed as 


27d 
OSNR = x 


g(ro)/SNR, (31) 
where SNR is the signal-to-noise ratio in the sensing of individual Shack—Hartmann 
spots. In a perfectly efficient and noiseless detector, SNR = VN, where N is the 
number of photons in the spot. g(ro) is the sensitivity of the wavefront sensor, 
which converts spot centroid displacement (in units of Hartmann spot size) to the 
local wavefront slope (in angular radians). g depends on seeing conditions, because 
worse seeing produces bigger Hartmann spots and therefore one unit of wavefront 
tip/tilt deflects the spot by a smaller fraction of the spot size. d is the size of the 
sub-aperture (essentially a lever arm from one sub-aperture to the next, converting 
angle to height). The factor 27/X is the wavenumber at the sensing wavelength 
(converting from height to radians phase). 7 is a unitless factor that accounts for 
scaling of the wavefront slope noise to wavefront phase noise by the reconstructor. 
Finally, x is a unitless factor that accounts for control loop averaging, since the 
wavefront is measured many times per characteristic response time of the control 
loop. 

The key ingredient in Eq. (31) is the number of photons collected, which sets 
the SNR. The number of collected photons, N, per Hartmann spot is 


_ &F 


N= x 19725 ™me 


fs 
where d is the size of the subaperture, f, is the sampling frequency (so 1/f, is the 
exposure time per frame), Fo is the flux of a magnitude zero star, and m, is the 
magnitude of the guide star. Fo is determined according to photometric brightness 
and v-band filter standards to be roughly 9.4 x 10° photons/m?/s. There is a trade 
between the time spent collecting light versus the rate of updating the wavefront 


(32) 
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correction. Obviously a brighter guide star provides higher signal-to-noise and allows 
a faster frame rate. 


5.5. Anisoplanatism and sky coverage 


AO systems require a sufficiently bright guide star to make high signal-to-noise 
measurements of the atmospheric turbulence on spatial scales set by the coherence 
radius, and time scales set by the ratio of coherence radius to wind. The number 
of natural guide stars bright enough to do this is limited by nature, however. For 
example, AO systems can work reasonably well with stars as dim as my = 18, 
but the density of stars of 13th magnitude and brighter is only about 30 stars 
per square degree. This means that for a science object in an arbitrary location 
on the sky, the nearest suitable guide star is on average about 5 arcmin away. 
Since the turbulence is distributed vertically, pointing off-axis to a guide star means 
that the optical path of the guide star light is not the same as that of the science 
object and, depending on this distribution, may cause enough error to defeat the 
wavefront correction. The situation is shown in Fig. 6. Turbulence at the ground is 
correctly accounted, however turbulence at high altitude is shifted and eventually 
the measured wavefront is uncorrelated to the science wavefront. To illustrate the 
effect on images, Fig. 7 shows a simulated star cluster corrected using the central 
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Fig. 6. Anisoplanatic error versus field angle in single guide star AO. The guide star has a different 
wavefront than the science object because of the differing paths through upper altitude turbulence. 
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(c) 


Simulated star cluster corrected with adaptive optics. Each panel has increasing amounts 


of turbulence at higher altitude, with integrated seeing conditions otherwise the same (1 arcsec 
seeing). (a) All turbulence at the ground layer, (b) 30% of turbulence at 5 km. (c) 70% of turbulence 
at 5km. The PSF degrades radially away from the guide star in the middle. The axis units are 


arcseconds. 


star as a wavefront reference. The AO-corrected PSF degrades with off-axis distance 
to a degree depending on the relative amount of turbulence at upper altitudes. 
The wavefront error as a function of the turbulence altitude distribution C?(h) is 


9 


9 \ 3/3 L 
TAniso = ( ) , where 69 = |2.914k? (seo(wy)° f C?(h)h?/dh 
0 


—3/5 


(33) 
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Fig. 8. Sky coverage is limited by bright star density and isoplanatic angle. (a) Strehl ratio and 
probability of finding a star in the isoplanatic patch, versus star magnitude. (b) Sky coverage 
percentage as a function of Strehl ratio for various observing wavelengths. 


For the cases shown in Fig. 7, 99 = oo, 10, and 5 arcseconds, respectively, with the 
later two typical of what is experienced at good mountain top sites. 

The charts in Fig. 8 quantify the difficulty of finding sufficiently bright guide 
stars. AO correction accuracy even on-axis is limited by the brightness of the star, 
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as Fig. 8(a) shows. In this chart, we have plotted the Strehl ratio at various science 
wavelengths as a function of guide star magnitude. The Strehl ratios are achieved 
after optimizing the wavefront sensor integration time to minimize overall wavefront 
measurement error, balancing the bandwidth error against signal-to-noise error. 
Conditions here are rg = 10cm, f, = 10 Hz, #) = 5arcsec. On the same chart, for 
comparison we show the probability of finding a star of that magnitude within the 
isoplanatic patch (diameter 69). As an example, in the case of imaging at A = 1 wm 
a guide star of 17th magnitude can only give an on-axis Strehl ratio of 0.1, barely 
reaching a level at which it might form a diffraction-limited core in the PSF, yet 
the probability of finding a star of this magnitude or brighter within the isoplanatic 
patch is around 0.04. The situation at \ = 2 um is a little better, with Strehl = 0.1 
and a probability of finding a star (in the Galactic plane) around 0.2. 

Figure 8(b) factors out the star magnitude and just plots sky coverage (probabil- 
ity of finding a star in an isoplanatic patch) versus the required Strehl ratio. Curves 
are shown only for the star density statistics near the Galactic plane (giat < 20°). 
As can be seen, there are precious few parts of the sky available to natural guide 
star adaptive optics with any reasonable Strehl ratio, except at science wavelengths 
approaching the thermal infrared. 

Although natural guide star AO has limited sky coverage, there are many bright 
objects in the sky that have scientific interest. The planets Neptune and Uranus, 
and some of the moons of Jupiter and Saturn are bright enough to be their own 
guide stars. The nearby bright stars are of interest for the study of their orbiting 
planets or dim companions. For planet and dim nearby companion work, the AO 
system must be followed by a coronagraphic imager so that the bright parent star is 
blocked from the science camera to allow a long exposure image of the companion. 
Still, the parent star is bright enough for the wavefront sensor to yield extremely 
high AO corrected Strehl over an angle that includes the orbit of the companion or 
planet. 

To achieve diffraction-limited imaging on an arbitrary astronomical object, no 
matter where it is located on the sky, it is necessary to generate an artificial guide 
star using a laser. This is the subject of later chapters in this volume. 


6. Conclusions 


This chapter has given an introduction to atmospheric turbulence and its effect on 
ground-based telescope astronomical images, then described how an adaptive optics 
system can help compensate these effects to improve resolution and sensitivity. 
With real-time AO wavefront correction the atmosphere blurring can be effectively 
removed and images obtained with resolution at the diffraction limit of the telescope. 
The AO system requires a guide star, a reference point-source beacon, for wavefront 
sensing, and here we have discussed the use and limitations of natural stars as the 
reference beacon. 
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The AO system design starts with an understanding of the requirements for 
wavefront correction and proceeds with analysis of sources of wavefront error. Ulti- 
mately, there is a trade between the limitations of technology, the physics of the 
atmosphere, and the brightness and availability of guide stars. Wavefront residual 
error after AO correction determines important imaging parameters such as Strehl 
ratio, resolution, PSF full-width half-max, and contrast. Generally, the residual error 
in wavefront must be contained with high probability within less than one-half wave, 
peak-to-valley, in order for images to appear diffraction-limited. Summing the error 
sources introduced in this chapter, we have 


2 

a = > o; = Omer + Opw t+ Fénr + (FAniso)@ to < (5) . (34) 
pees error 

We might also include other sources of error such as systematic errors in alignment 

and calibration, plus we want to consider these terms over the imaging fields of the 

instrument (indicated by © here) and at a distribution of zenith angles appropriate 

for the site and typical location of science targets (quantified with the sec(~) factor 

in ro, fg, and 6). 

Single-conjugate AO uses only one deformable mirror to correct wavefront for 
the entire science field of view, and thus is sensitive to anisoplanatic error, where 
the further an object is from the guide star the more wavefront error it has. Fur- 
thermore, fast wavefront sensing requires a bright guide star for determining the 
wavefront accurately for diffraction-limited performance. Unfortunately these two 
issues compound, resulting in severe limitations in sky coverage for natural guide 
star AO systems; the mean distance to a bright natural stars is large compared to 
typical isoplanatic angle, so natural guide star AO is limited to only a very small 
fraction of the sky in small patches around bright stars. Fortunately however, there 
are a number of interesting science studies that can be performed around bright stars 
and solar-system objects, and natural guide star AO systems have been successfully 
utilized for these cases. 

Expanding the AO sky coverage to a wider class of astronomical science pro- 
grams requires an artificial guide star generated by a laser that can be positioned 
at any point on the sky. Furthermore, to address science fields larger than the 
isoplanatic angle the AO system will require more than one guide star and more 
than one wavefront correction element to patch together a field that is diffraction- 
limited everywhere on that field. These are the topics of subsequent chapters in this 
volume. 
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The subject of this chapter is single conjugate adaptive optics (SCAO) with a 
laser guide star. SCAO systems use a single deformable mirror conjugate to a 
specific altitude, normally the primary or secondary mirror of the host telescope. 
The fraction of the sky that can be corrected by a single conjugate adaptive optics 
system is greatly increased with the use of a laser guide star (LGS) as the source 
for high spatial order wavefront sensing. A natural guide star is still needed for 
low order (i.e., tip, tilt and focus) wavefront sensing; however, it can be much 
fainter, leading to higher sky coverage. The relative roles of these two guide stars 
and the implications of lasers are discussed. The relative performance of these 
two guide stars are also presented along with a sample error budget for the LGS 
case. The reader is also taken on a tour through a sample LGS SCAO system. 
The scientific productivity of these systems is discussed in closing. 


1. Overview 


The subject of this chapter is single conjugate adaptive optics (SCAO) with a laser 
guide star (LGS). Single conjugate refers to the use of a single wavefront sensor 
(WFS) and deformable mirror (DM) conjugate to a single altitude, typically near 
the ground (e.g., the telescope’s primary mirror). More complex AO systems (e.g., 
GLAO, LTAO, and MOAO) using multiple WFS and/or DMs will be discussed in 
subsequent chapters. 

Adaptive optics (AO) systems use two types of guide stars: natural guide stars 
(NGS) and LGS. Unfortunately, bright enough NGS for AO wavefront sensing are 
only available over ~1% of the sky. In order to overcome this sky coverage limita- 
tion, astronomers have turned to LGS. A NGS is still needed for low-order correc- 
tion; however since the NGS in this case can be much fainter the sky coverage is 
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dramatically increased. The NGS is used to measure (1) high bandwidth image 
motion since the laser experiences image motion on the way up as well as the 
way down, (2) low bandwidth focus since the sodium layer can change altitude, 
and (3) low bandwidth wavefront sensor subaperture centroid offsets introduced by 
variations in density (i.e., structure) through the sodium layer. 

Lasers used to produce LGS come in two varieties: sodium-wavelength (589 nm) 
and Rayleigh. The former are used to excite sodium atoms in the Earth’s mesosphere 
at an altitude of ~90km which re-emit providing a LGS for wavefront sensing. 
The latter use atmospheric Rayleigh scattering at distances of 10-35 km from the 
telescope. Rayleigh system lasers must be gated (over a ~1km range) or dynami- 
cally refocused (over a ~ 10km range) to produce a small enough LGS. The main 
advantage of the sodium LGS is that they are much higher above the turbulence. 
A primary advantage of Rayleigh lasers is lower cost since commercially available 
lasers can be used (typically at ultraviolet or green wavelengths). 

LGS have a number of physical and technical limitations which are not experi- 
enced with NGS: 


e LGS are not point sources. Some form of beam transport system is used to trans- 
port the laser beam from the laser to a launch telescope. The size of the LGS 
on the sodium layer is a function of the wavefront quality of the projected laser, 
diffraction due to the laser launch telescope diameter, and the atmospheric tur- 
bulence through which the laser passes. Typically LGS are 1—2 arcsec FWHM. 
In addition since the sodium layer is ~10km thick, or the Rayleigh range gating 
is over some distance range, the image elongates as you move off-axis from the 
laser launch telescope. For this reason, the laser launch telescope is typically 
located behind the secondary mirror of the astronomical telescope. The sodium 
LGS perspective elongation from a 10 km thick sodium layer is 1.1 arcsec when 
observed from 5 m off-axis. 

e LGS are not at infinity. The light from a star passes through a cylinder whereas 
the LGS samples a cone introducing a small displacement between the measured 
distortions and those experienced by the light from the science object; a displace- 
ment that grows with the altitude of the turbulence. In addition the telescope 
will focus the LGS at a different focal plane than the NGS and this focus changes 
with telescope elevation angle, requiring the wavefront sensor to track the LGS 
focus. 

e Sodium LGS brightness can vary by factors of two or more in a few hours. Meso- 
spheric sodium is the product of meteor ablations and therefore has a seasonal 
variation with meteor showers. 

e LGS can be attenuated by clouds. Rayleigh LGS can potentially be brightened if 
there are cirrus clouds within the gated range. 

e Lasers introduce a number of safety concerns that must be addressed. Beyond 
personnel and equipment safety there is also the potential to illuminate aircraft or 
sensitive satellites, or to have the observation by another telescope contaminated 


Single Conjugate Laser Guide Star Adaptive Optics 271 


by the laser light.1 In the US the aircraft safety plans must be approved by the 
Federal Aviation Administration (Keck uses an aircraft transponder-based safety 
system; no system is required for ultraviolet lasers, which are not a risk to pilots) 
and the observation target list must be pre-approved by US Space Command, 
who provides any needed blackout windows for each target. On Mauna Kea a 
laser traffic control system predicts intersections between telescope pointing and 
can automatically shutter the laser based on a first on-target protocol. 


2. Performance 


AO performance degrades as the NGS used for wavefront sensing gets fainter, as 
shown in Fig. 1 for the Keck I] AO system and an on-axis NGS. The Strehl ratic® 
decreases due to poorer correction of both high-order errors and image motion; the 
uncorrected image motion broadens the FWHM of the core of the AO-corrected 
image. As the NGS gets fainter (~14th magnitude) the LGS AO system provides 
better correction, both in Strehl ratio and FWHM. The FWHM is reduced since 
all of the NGS light can be used to sense image motion. The LGS AO system also 
has the advantages that the LGS can be pointed directly at the science object 
and that the coherence angle for image motion is much larger than for higher 
order atmospheric turbulence. The LGS AO system therefore has better K-band 
(2.2 um wavelength) performance than the NGS AO system even for a bright NGS 
if the NGS is more than ~10 arcsec from the science object. 


K-band Strehl 


Fainter NGS > 


0.0 Ee eta bP 


8 10 12 14 16 18 
R-band magnitude 


Fig. 1. Typical Keck AO system K-band Strehl ratio versus R magnitude for an on-axis NGS. 
The inset is a 1 x 1 arcsec 3D profile of an LGS AO image with a Strehl ratio of 0.36 and a FWHM 
of 0.050 arcsec. 


4Strehl ratio, a commonly used metric for adaptive optics systems, is the ratio of the peak intensity 
of a point spread function (i.e., image of a star) to that of the perfect diffraction-limited point 
spread function for that same optical system. 
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Table 1. Sample Keck LGS AO wavefront error 
budget for a high sky coverage science case. 


Wavefront error term nm rms 
Atmospheric fitting 122 
Telescope fitting 66 
Science camera 110 
Wavefront bandwidth 182 
Wavefront measurements 216 
LGS focus error 70 
Focus anisoplanatism 181 
LGS high-order error 80 
Calibration errors 30 
Miscellaneous 73 
Total high-order wavefront error 402 
Tip-tilt bandwidth 247 
Tip-tilt measurement 214 
Tip-tilt anisoplanatism 115 
Total tip-tilt wavefront error 346 
Total wavefront error 531 


Note: The total rms wavefront error is the square 
root of the quadratic sum of the individual 
terms. A total error of 531 nm produces a Strehl 
ratio of 0.10 at 2.2 um wavelength. 


A simplified sample Keck IT LGS AO system wavefront error budget for a 
high sky coverage, and hence low performance, observation is provided in Table 1. 
The tip-tilt and high order errors affect the AO-corrected image differently. The 
high-order errors reduce the amount of energy in the diffraction-limited core of the 
AO-corrected image without changing the image size. The tip-tilt errors broaden this 
core without changing the amount of energy in the core. At K-band the diffraction- 
limited core has a FWHM of 0.045 arcsec. Table 1 high-order terms produce a Strehl 
ratio of 0.27 while the tip-tilt errors broaden the AO-corrected core to a FWHM of 
0.074 arcsec. 

Table 1 illustrates some of the performance issues and engineering trade-offs 
associated with LGS AO. For example: 


e The fitting errors can be reduced with more actuators on the deformable mir- 
ror. This requires splitting the guide star light into more sub-apertures, thereby 
increasing the wavefront measurement and bandwidth errors or requiring a 
brighter LGS. 

e The science camera and calibration errors can be reduced by optimizing the 
deformable mirror shape to compensate for non-common path aberrations 
between the science instrument and wavefront sensor path, including aberrations 
in the science instrument. For example, the Keck systems use a fiber source at 
the input to the AO system and a phase retrieval algorithm to optimize the fiber 
image at one position on the science camera. This optimized deformable mirror 
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shape is used to set the wavefront sensor centroid offsets so that the wavefront 
sensor drives the deformable mirror to this shape in the absence of turbulence. 
Science instrument field-dependent aberrations, the source of the listed science 
instrument errors, cannot be addressed in this manner. 

e The wavefront bandwidth and measurement errors are directly traded against 
each other since you can integrate longer to reduce the measurement error. Some 
improvement can be made with better detectors (lower noise, faster readout) but 
the big gains come from brighter LGS. 

e The LGS focus error for a sodium LGS results from the changing height of the 
sodium layer. A NGS is therefore needed to distinguish these from changes in the 
telescope focus. 

e Focus anisoplanatism is the error due to sampling a cone of atmosphere with 
the LGS instead of the cylinder that the science objects light passes through. A 
solution to this error would be to perform tomography of the atmosphere using 
multiple LGS. 

e The LGS high-order error is primarily due to aberrations seen on the LGS wave- 
front sensor due to the varying elongation of the LGS in each wavefront sensor 
sub-aperture. This is addressed with a low bandwidth truth sensor looking at an 
NGS to provide centroid offset corrections to the LGS wavefront sensor. 

e The tip-tilt bandwidth, measurement and anisoplanatism errors are directly 
traded against each other. Tip-tilt anisoplanatism is due to the somewhat differ- 
ent atmospheric tip-tilt in the directions of the science object and tip-tilt NGS; 
this error can be reduced by using a closer, fainter NGS at the expense of higher 
measurement and bandwidth errors. Some improvement can be made with better 
detectors (lower noise, faster readout). One approach being taken at Keck is to 
implement a near-infrared tip-tilt sensor since stars are partially AO-corrected 
by the LGS AO system and are generally brighter in the near-infrared.” 


3. A Sodium Wavelength Laser LGS AO System 


The Keck AO systems can be used as an illustrative example of the engineering 
behind an LGS AO system,® specifically a sodium wavelength laser system. The 
Keck AO systems are located in a thermally insulated enclosure on the left Nasmyth 
platform of each Keck telescope. Three near-infrared science instruments can be used 
with the AO systems: a camera (NIRC2), spectrograph (NIRSPEC) and integral 
field spectrograph (OSIRIS). 

A schematic of the Keck I AO bench is provided in Fig. 2. The light from the 
telescope to the NIRC2 or OSIRIS science instruments passes through the items 
numbered 1-7 in Fig. 2 (items 5 and 10 are removed for this science). Since the 
AO system is at the Nasmyth focus of an altitude-azimuth mount telescope, both 
the sky and the pupil rotate as the telescope tracks a science object. A three- 
mirror rotator (item 1) is used to keep either the field or the pupil fixed during an 
observation. The telescope focus is located just inside the rotator and the beam is 
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Fig. 2. Schematic of the Keck AO bench. The light from the telescope enters at the top of the 
schematic diagram. Some numbered items have been removed to simplify the schematic. 


subsequently diverging into the AO system. A fold mirror (2) acts as the AO fast 
tip-tilt mirror. Ideally this tip-tilt mirror would be at a pupil plane, but this was not 
practical in the Keck system. The commercial Xinetics 349-actuator piezo-electric 
deformable mirror (4) has an actuator spacing of 7mm, which drove the physical 
size of the optical system. The science path optics and their locations (3, 4 and 6) 
were chosen to satisfy five requirements: the output f-number is identical to the 
telescope’s f-number; the final exit pupil location is the same as the telescopes’, with 
respect to the image plane; the deformable mirror is conjugate to the telescope’s 
primary mirror; the 10.949 m primary mirror corresponds to 136.25 mm on the 
deformable mirror; and good image quality is provided over a 120 arcsec diameter 
field. A collimated beam between two identical off-axis parabolas (3 and 6) was used 
to achieve these requirements. 

The light to the science instrument (>0.95 jum) passes through an infrared- 
transmissive dichroic beamsplitter (7). The dichroic reflects the visible light. The 
visible light is subsequently split by either a beamsplitter or sodium-transmissive 
dichroic positioned by a vertical translation stages (13); the latter is used for LGS 
AO. The transmitted light goes to a pair of field selector mirrors (14) which are 
used to select a NGS or LGS anywhere in a 60 arcsec diameter field and to put this 
star down the optical axis of the high bandwidth wavefront sensor optics (15) to the 
wavefront sensor camera (16). The light reflected by (13) reflects off mirrors (17) 
and (18) to the 120 arcsec diameter field of the acquisition camera (21). Mirror (18) 
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can be removed by a vertical translation stage to allow the visible light to be trans- 
mitted to the low bandwidth wavefront sensor (19) and the visible tip-tilt sensor 
(20). These sensors are co-aligned to use the same star, with the light split by a 
beamsplitter cube, and are moved around the 120 arcsec diameter field by a x,y,z- 
stage. A choice of dichroic beamsplitters (not shown) just before OSIRIS allows 
some of the light, over a 100 arcsec square field, to be split off to a near-infrared 
tip-tilt sensor (not shown). 

The visible tip-tilt sensor consists of four lenslets, each of which sends its light to 
an avalanche photo detector. The near-infrared tip-tilt sensor, based on a 1024 x 1024 
pixel detector, has recently been added to the system. Low noise is achieved by 
multiple, non-destructive reads of small regions (e.g., 4 x 4 pixels using 0.05 arcsec 
pixels) of this detector at the location of the star used for tip-tilt. The advantage of 
the near-infrared is that that tip-tilt measurement can be done on the AO-corrected 
image of a star, and the average star is brighter in the near-infrared. Differential 
atmospheric refraction between the science and tip-tilt sensor wavelengths is han- 
dled by offsets to the position of the x,y-stage on which the visible tip-tilt sensor is 
mounted or centroid offsets to the near-infrared tip-tilt sensor. 

Both wavefront sensors use an array of square lenslets to divide up the pupil. 
The deformable mirror (and hence telescope pupil) is re-imaged onto the lenslet 
array with the corners of each lenslet aligned to the deformable mirror actuators. 
The 7-mm DM actuator spacing corresponds to the 200m lenslet size for the 
high bandwidth wavefront sensor. The image from each lenslet is re-imaged onto a 
CCD. The high bandwidth wavefront sensor CCD is small format (80 x 80 pixels), 
providing low read noise at high frame rates. The frame rate (~50 Hz to 2kHz for 
NGS AO and 500 Hz to 1kHz for LGS AO) is selected to balance the measurement 
and bandwidth errors. The larger format low bandwidth wavefront sensor CCD is 
run at much slower rates (~0.1 Hz to 0.01Hz) to observe a NGS during LGS AO 
observations. The high bandwidth wavefront sensor is mounted on a focus stage 
that tracks versus telescope elevation to stay conjugate to the sodium layer during 
LGS AO operations; the focus range of 260mm is driven by the difference in focus 
between an object at infinity and the sodium altitude at zenith. The low bandwidth 
wavefront sensor provides focus corrections for sodium layer altitude variations and 
centroid offset corrections to account for sodium thickness variations. 

There has been considerable evolution in the sodium-wavelength laser technolo- 
gies over the past 20 years. This evolution is well represented in the previous Keck IT 
laser and the current Keck I and Keck II lasers. The challenge sprang from the lack 
of a commercial market for this wavelength laser. The Keck II dye laser was a 
modified version of lasers developed for a Lawrence Livermore National Laboratory 
isotope separation program. A total of six Q-switched, flashlamp-pumped Nd:YAG 
lasers are used to pump three stages of dye amplification at 26 kHz: one each for 
the dye master oscillator and pre-amplifier and four for the amplifier. The Keck I 
laser is a mode-locked continuous wave, solid-state sum frequency generation laser 
developed by Lockheed Martin Coherent Technologies.4 The new Keck II laser, 
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commissioned for science in 2016, is a continuous wave Raman-fiber amplifier laser 
with sum harmonic generation developed by the commercial companies TOPTICA 
and MPBC,° based on a technology developed and patented by ESO.°® 

All three Keck lasers provide ~20W of output power but their sodium-atom 
coupling efficiency is dramatically different. For maximum coupling efficiency lasers 
should be continuous wave with a narrow line tuned to the peak of the sodium 
doublet’s D2a line, circularly polarized, and ~10% of the laser power should be 
tuned to the D2b line to repump the sodium atoms back to the F = 2 upper 
ground state.’ The TOPTICA/MPBC laser fulfills all these criteria. 

The size and power usage of these lasers has shrunk with each iteration, and 
their location has improved. The pump lasers and master oscillator for the dye 
laser were located in a thermally insulated room on the dome floor. A single mode 
fiber transported the seed light from the dye master oscillator, and multi-mode 
fibers transported the pump light to the two stages of dye amplification located on 
the telescope’s elevation ring. The Keck I laser is mounted in a thermally isolated 
room on the telescope’s Nasmyth platform. The new Keck II laser includes a single 
electronics rack and heat exchanger mounted under the Nasmyth platform with the 
laser itself mounted on the telescope’s elevation ring. 

The laser light is projected from the Keck telescopes through 50 cm diameter 
launch telescopes located behind the telescope’s secondary mirror. Free space beam 
transport systems are used to transport the laser light to the launch telescope. 
Photonic crystal fibers have been used for beam transport at the VLT and Subaru 
but multiple nonlinear phenomena make this approach practical only at low powers. 

The free space transport system is considerably easier when the laser is on the 
telescope’s elevation ring, as in the Keck II case. The Keck II laser location and 
beam transport system is shown in Fig. 3. The laser enclosure contains the laser, 
beam steering optics and diagnostics. The secondary mirror module contains several 
beam fold mirrors (CM3 to CM5), an optics bench mounted to the launch telescope, 
and the launch telescope. The optics bench includes the optics needed to steer the 
beam on the sky and to expand the laser beam to match the required input to the 
afocal launch telescope. The laser light is completely enclosed until it exits that 
launch telescope. Two control loops compensate for telescope flexure to keep the 
laser aligned through the beam train. 


4. A Rayleigh Laser LGS AO System 


The Robo-AO system can be used as an illustrative example of the engineering 
behind a Rayleigh laser LGS AO system, specifically the differences with respect 
to a sodium wavelength laser system. The Robo-AO system described by Ref. 8 
was mounted on the robotic 1.5 m telescope at Palomar observatory (the system 
has subsequently been moved to Kit Peak National Observatory). This system is 
unique and highly efficient in that it is fully autonomous, from intelligent queuing 
of targets through data reduction. 
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Fig. 3. (a) Laser beam transport system from the laser enclosure to the telescope secondary 
mirror module. The laser light is directed with flat mirrors (CM1 to CM5) through beam tubes. 
(b) The laser beam exiting the launch telescope to the sky. 


The laser launch system uses a pulsed 355 nm wavelength laser, co-aligned with 
the 1.5-m telescope, and focuses a 15-cm projection aperture to a beam waist at 
a distance of 10 km. The constant distance means that the laser can be used to 
measure atmospheric focus. The Shack—-Hartmann wavefront sensor records light 
over a 450-m range about this beam waist at a frame rate of 1.2kHz. The measured 
tip-tilt is used to control the laser pointing and the higher orders are used to control 
a 12 x 12 actuator deformable mirror. The science camera is an electron-multiplied 
CCD read out at 8.6 Hz, allowing tip-tilt to be addressed by shift-and-add of the 
images using a star in the 43” x 43” science field. 


5. Science with Single Conjugate LGS AO Systems 


Figure 4 is a tabulation of all the refereed astronomical science papers published 
using LGS AO systems through 2015. The first of these papers were published 
beginning in 1995 based on observations made with the US. Air Force 1.5m Starfire 
telescope using their Rayleigh laser based LGS AO system.? Routine LGS AO sci- 
ence publication began in 2005 with the commissioning of the Keck I LGS AO 
system. All but two of these systems have been SCAO systems; the exceptions 
are the Gemini South multi-conjugate (MCAO) and SOAR ground layer (GLAO) 
systems. The telescopes with single conjugate LGS AO systems still in science oper- 
ation include Gemini North, Keck I and I, Lick, Robo-AO and Subaru; all but 
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Fig. 4. Histogram showing the distribution of the refereed science papers published per year 
through 2016 using data from LGS AO systems.!° A total of 415 papers have been published. The 
LGS papers have been 5% solar system, 49% galactic and 46% extragalactic. 


one of these (Robo-AO) use sodium wavelength lasers. The European Very Large 
Telescope’s (VLT) SCAO system is being replaced with a four laser system, primar- 
ily for GLAO, to begin science operations in 2017. 

Science with NGS AO was largely limited to solar system and galactic tar- 
gets with some observations of extra-galactic targets near bright stars. LGS AO 
has in particular opened up high angular resolution measurements of extra-galactic 
science targets; extra-galactic observations have made up 47% of the LGS AO sci- 
ence papers. LGS AO has also facilitated AO observations of dust obscured science 
targets like the center of our own galaxy. 
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Large area surveys will dominate the forthcoming decades of astronomy, and their 
success requires characterizing thousands of discoveries through additional obser- 
vations at higher spatial or spectral resolution, and at complementary cadences 
or periods. Only the full automation of adaptive optics systems will enable high- 
acuity, high-sensitivity follow-up observations of several tens of thousands of these 
objects per year, maximizing on-sky time. Automation will also enable rapid 
response to target-of-opportunity events within minutes, minimizing the time 
between discovery and characterization. 

In June 2012, we demonstrated the first fully automated operation of an 
astronomical adaptive optics system by observing 125 objects in succession with 
the Robo-AO system. Efficiency has increased ever since, with a typical night 
comprising 200-250 automated observations at the visible diffraction limit. By 
observing tens of thousands of targets in the largest-ever adaptive-optics surveys, 
Robo-AO has demonstrated the ability to address the follow-up needs of current 
and future large astronomical surveys. 


The Need for Efficient Adaptive Optics 


In 1998, J. Hardy identified several future directions for adaptive optics (AO), 
including a “Simplified operator interface with the adaptive optics, the goal being 


to make the operations mostly automatic ... including control of the beacon laser 


It is now, during the current era of extremely large astronomical surveys, that there 
is a clear need for automated AO imaging and spectroscopy to fully characterize 


the large numbers of discoveries.” 
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Table 1. Example automated AO science. 


Science topic Number of targets and/or cadence 

Transiting exoplanet hosts >10,000 targets 
Wide exoplanets and brown dwarfs 5000—10,000 targets 
Large survey stellar multiplicity >10,000 targets 
Transient characterization Rapid response, declining follow-up cadence 
Astrometric microlensing Dozens of high-cadence events per year 
Solar system small body nucleus characterization, Few night response, ~10 Manx, Centaurs, 

exopause searches, surface minerology and comets per year 
Discover/monitor lensed quasars >25,000 + monitoring 3 nights/mo. 
Monitoring planetary weather Snapshot 2-3 times per night 
Monitoring jets, outflows and shocks Several times per year 


For example, NASA’s Kepler mission identified more than 3000 stars with 
repeating brightness changes indicative of transiting exoplanets. Follow-up AO 
imaging of these objects is used to determine the sources contributing to the Kepler 
light curves in order to rule out astrophysical false positives, to accurately measure 
the radii of the detected planets with respect to their host star, to measure the 
effect of stellar multiplicity on exoplanet formation, and to determine the physical 
association of detected stellar companions.* '° Transit missions such as NASA’s 
Transiting Exoplanet Survey Satellite! and ESA’s upcoming PLAnetary Transits 
and Oscillations’? are expected to discover many more exoplanet systems than 
Kepler. 

General wide field surveys such as the Zwicky Transient Facility,'’ Evryscope,!* 
Pan-STARRS!° and ESA’s Gaia!® are detecting hundreds of thousands of close 
stellar systems, partially resolved in imaging data, visible from eclipses and other 
time-domain phenomena, or forming confusing asterisms in crowded fields. For many 
science goals, each survey requires high-angular-resolution follow-up to confirm sys- 
tem memberships or remove confusing effects from the crowded fields.‘’ Only very 
efficient AO can confirm and measure the properties of a significant fraction of these 
close stellar systems. 

Outside our galaxy, adaptive optics images are also crucial in disentangling 
transient events from their environments: determining whether they are hosted in 
a faint galaxy, are associated with precursor images, or if their light curves are 
contaminated by nearby sources.!5-70 

These science objectives can be loosely classified into three major categories: 
large population studies, rapid target characterization, and long-term monitoring. 
Table 1 highlights a sample of areas where automated AO is necessary to enable 
science that was previously thought to be impractical with existing facilities. 


2. General Philosophy for Automation 


Although the scientific community is currently well served by flexible stellar /laser 
or boutique high-contrast AO systems for limited numbers of targets, their scarcity, 
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complexity, and competition for observing time on large apertures limits their suit- 
ability for following up large numbers of targets.?!:?? The key for automated AO 
is to focus primarily on those scientific objectives that uniquely require high-time- 
efficiency observing. 

The overarching philosophy in the automation of adaptive optics is to prioritize 
reliability, predictability, and ease-of-use over more traditional AO metrics such 
as Strehl ratio, achievable contrast, or sky coverage. While the requirements on 
the traditional metrics should be well understood and meet the science need, they 
should not be the driving factors in system design because the primary metric is 
simply the number of targets that can be observed at high-angular-resolution. 


e Reliability: The entire observing system comprising telescope, AO, science 
instruments, control software, and data reduction pipeline, needs to be very 
reliable in operation. All hardware components should be sufficiently past their 
beta-testing phase of development and be easily spared or serviceable if necessary 
(with off-the-shelf hardware the preferred selection). Software should be designed 
to be as modular as possible in order to increase reliability and delineate areas of 
complexity. It should be straightforward to identify and recover from error states 
during operations to minimize or eliminate the need for human intervention. 
During the commissioning phase of the instrument, it is necessary to exercise and 
stress the system on sky, identify failure modes, fix errors, and repeat the process 
until the robotic system is capable of independent operation. 

e Predictability: The operation of the system should be designed intentionally 
such that the observing system follows logical paths. The output of the system 
should be repeatable and deterministic given a set of input conditions and param- 
eters such as seeing, wind speed, Zenith angle, etc. Effort should be placed upon 
minimizing the effects of conditions where one has some control (e.g., air condi- 
tioning and effective dome venting to ensure thermal equalization of the telescope 
and mitigation of dome seeing). Diagnostics should be in place to measure external 
factors which serve as predictors for the quality of an observation, oftentimes as 
criteria for suitable observing conditions or determining the need for repetition 
of a particular observation. 

e Ease-of-use: The observing system should operate itself with no or minimal 
human oversight, as opposed to a “push-button” AO system that can be driven 
by non-experts (e.g., at most major observatories where a dedicated telescope 
operator or support scientist operates the AO system). Any time an operator 
needs to make a decision, precious time is lost. Scientists should limit their inter- 
action with the system to adding observation requests and retrieving their data. 


By following these guiding priorities, it is likely prohibitively difficult to develop 
an immensely flexible AO platform that can support a suite of different automated 
modes for a range of seeing conditions and multiple target types. Instead, limiting 
the modes of operation minimizes the number of steps and variables needed to 
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perform an observation, containing the development costs and schedule of an auto- 
mated system. This has the natural consequence of being favorable to queue type 
observing, and simplifying requirements on data reduction and analysis pipelines. 
Focusing the development on creating a reliable robotic system that can complete 
the science mission builds a robust and capable system. 


3. Automation Enabled by Technological Maturity 


While scientific requirements drive the need to automate AO systems, technolog- 
ical maturity limits their development. Adaptive optics systems are almost exclu- 
sively developed as prototype instruments that take advantage of a new architecture 
and/or component development. Projects are then required to beta-test a subset of 
new technology while primarily building on established methods and technology, and 
often times only the largest aperture observatories have sufficient funding for this 
development. To develop automation as the next technological leap, the component 
technologies need to be proven as reliable and not be the primary cost drivers. 

In the decades since the first scientific adaptive optics systems were put into 
use, many component technologies have continued to improve in functionality, reli- 
ability, and cost. A non-exhaustive list of examples of many relevant innovations for 
automated AO follows: 


e Optical phase correctors: The Visible Light Laser Guidestar Experiments 
(ViLLaGEs) at Lick Observatory? pioneered the use of microelectromechani- 
cal systems (MEMS) deformable mirrors (DMs) in astronomical adaptive optics. 
MEMS with hundreds of actuators and several wavelengths of phase correction 
are now available as off-the-shelf catalog items at an order of magnitude cost 
less than traditional piezo-stack deformable mirrors. So long as the mirrors are 
sealed against humidity, they have proven to be extremely reliable, have negligible 
hysteresis for most applications and have no required feedback on the commanded 
position of the individual actuators.?4 

e Laser guide stars: Sodium excitation dye and sum frequency solid-state laser 
guide stars are relatively expensive due to non-existent commercial applications 
and require extensive maintenance schedules for reliable operation due to their 
prototypical nature. While comparable in cost, new 589nm lasers based on 
Raman fiber amplifiers promise to improve upon both the photon return and 
reliability and have begun use at large telescopes in 2016. In contrast, artificial 
guide stars created by Rayleigh scattering laser light off of air molecules can 
be driven by turn-key Q-switched lasers developed for the Silicon wafer pro- 
cessing industry®* (with typical mean times between failures exceeding 10,000h). 


4The photon return per Watt is maximized around A ~ 450nm with a ~25% drop off at the 
doubled and tripled Nd:YAG wavelengths of \ = 355nm and 532nm.?° Pulse rates need to fall 
between the minimum AO loop rate and the beacon pulse round-trip time, e.g., ~1 kHz to 15 kHz 
for a beacon at 10 km. 
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Additionally, Rayleigh laser beacons at wavelengths below the threshold of human 
vision (A < 400nm) can be granted a waiver from Federal Aviation Administra- 
tion control measures (e.g., human spotters, transponder detectors) because of 
their inability to flash-blind pilots or produce damaging levels of radiation during 
brief exposures for typical, <100 W, power levels. 

e Wavefront reconstructor controllers: Early generations of adaptive optics 
systems relied on either analog or very specialized digital computing technologies 
for wavefront reconstruction and control. As of a decade ago, personal computers 
(PCs) have proven themselves capable of running modest actuator count infrared 
AO systems on large telescopes,?® ?” 
planet AO systems.?° PCs support many common software languages that have 
been used to code wavefront reconstructor controller routines, e.g., C, C++, G, 
MATLAB and Python, lowering the barrier to new development or to making 
modifications to existing code. 

e Science cameras: A major complication with observing astrophysical objects 
with AO is that their peak brightness will not be known until the actual obser- 
vation. Often one relies on seeing-limited or poorly sampled catalogs, which may 


and more recently are able to drive exo- 


not be accurate, to plan observations, and variable atmospheric conditions result 
in a constantly changing Strehl ratio. Long integrations with a conventional CCD 
or infrared array detector may need to be repeated if there is insufficient radiant 
flux in a single image (additional images lead to more accumulated read-noise) 
or if the object saturates or exceeds the linear dynamic range of the detector. 
With the advent of fast readout, sub-electron readout detectors in astronomy 
(e.g., electron multiplying (EM) CCDs?9:°° and infrared avalanche photodiode 
arrays®'), the effects of inaccurate or variable peak brightness are significantly 
mitigated. Beyond additional data storage requirements, there is minimal noise 
penalty for taking a long series of short exposures, each of which are able to span 
the entirety of the dynamic range of the detector. An additional benefit of having 
a series of short exposures is the ability to register each detector read to compen- 
sate for any uncorrected stellar displacement, or to apply other post-processing 
techniques such as speckle interferometry. 

e Host telescopes: Full automation of the host telescope, as has been done with 


other astronomical survey facilities,3?'3% 


is crucial to maintaining efficient obser- 
vations. Activities previously handled by telescope operators and night assistants, 
such as environmental safety, pointing of the telescope, and instrument calibra- 
tions, are controlled and sequenced by a computer. While automated telescopes 
today are almost exclusively modest in aperture, no physical limitation prohibits 


the full automation of large telescope apertures. 


bThe Large Synoptic Survey Telescope will soon be the largest automated astronomical telescope 
in existence. 
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4. CAMERA 


The idea for an autonomous laser guide star AO system was first proposed by Ref. 34 
in 2007. The Compact Autonomous MEMS-based Rayleigh AO (CAMERA) system 
was intended to be a low-cost AO system mounted to a small robotic telescope 
to perform survey astronomy in queue-mode as well as monitor VOEvent feeds to 
characterize important transient objects. A dichroic would split the final AO output 
to visible and near infrared cameras: one of the two would be used as tip-tilt sensor 
driving a fast-steering mirror to correct stellar image displacement while the other 
camera would be used to capture science images. 

Inspiration for the project stemmed from several developments in AO engineer- 
ing: the development of PC reconstructors and economical wavefront sensors as part 
of the multiple guide star tomography demonstration at Palomar Observatory;7° the 
ongoing development of cost-effective MEMS DMs at Lick Observatory; the use of 
industrial lasers at UV wavelengths at Mt. Wilson®° and for multiple Rayleigh bea- 
cons at the MMT;°*° and the availability of the recently roboticized 1.5-m telescope 
at Palomar.” 

A proof-of-concept laboratory AO system was then developed at Caltech that 
used a Boston Micromachines Multi-DM, a Physik Instrumente fast steering mirror, 
a PC reconstructor and a Shack—Hartmann wavefront sensor that used a SciMeasure 
camera with an E2V CCD39 detector. It achieved a closed-loop update rate of 
100 Hz, corrected wavefront aberrations induced by a spinning disk of plastic, and 
could be controlled through a simple web-based interface. 


5. Robo-AO 


To transform the CAMERA concept and laboratory system to a scientifically com- 
petitive on-sky AO system, we initiated the Robo-AO project®® °° in 2009 as a 
collaboration between Caltech Optical Observatories and the Inter-University Cen- 
tre for Astronomy and Astrophysics (TUCAA). Our first objective was to build and 
demonstrate a Robo-AO system for the Palomar 1.5-m telescope, then clone the 
system and install on the IUCAA Girawali Observatory 2-m telescope. While we 
nominally followed the CAMERA blueprint, we made two necessary major devi- 
ations. We postponed the development of the infrared science camera due to its 
significant relative cost and potentially lower scientific impact compared to the visi- 
ble science camera. We also replaced the CAMERA control software architecture of 
an all-encompassing monolithic program with a new modular concept consisting of 
individual subsystems that would be overseen by a master scheduler and watchdog 
processes. A detailed description of the first as-built Robo-AO system follows. 


5.1. Hardware 


Robo-AO comprises several main systems (Fig. 1): the UV laser projector; an instru- 
ment mounted at the Cassegrain focus of the telescope, and a set of electronics 
including the PC reconstructor and control computer. 
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Fig. 1. The Robo-AO system on the automated 1.5-m telescope at Palomar Observatory. 
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Fig. 2. The inside of the Robo-AO laser projector. 


The UV laser projector enclosure is 1.5m x 0.4m x 0.25m, with mass ~70kg, 
and attaches to the side of the 1.5-m telescope (Fig. 2). Inside are a commercial 
pulsed 12-W ultraviolet laser (35ns pulses every 100 us, 4 = 355 nm); a redundant 
safety shutter; and an uplink tip-tilt mirror to both stabilize the apparent laser beam 
position on sky and to correct for up to 2’ of differential pointing errors. A positive 
lens on an adjustable focus stage expands the laser beam to fill a 15-cm output 
aperture lens that is optically conjugate to the tip-tilt mirror. The laser beam is 
coaligned with the bore-sight of the principal telescope with its waist focused to a 
10-km line-of-sight distance. 

The adaptive optics system and science cameras reside within a Cassegrain- 
mounted structure of approximate dimensions: 1m x 1m x 0.2m (Fig. 3). Light 
from the telescope secondary mirror enters the instrument and is intercepted by a 
fold mirror which directs a 2'-diameter field to a dual off-axis parabolic (OAP) mir- 
ror relay. The first fold mirror is on a linear motorized stage that can be moved out 
of the beam path, revealing internal calibration optics that simultaneously simulate 
the ultraviolet laser focus at 10km and an optical/infrared incandescent source at 
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Fig. 3. The inside of the Robo-AO Cassegrain adaptive optics system. 


infinity, matching the telescope focal ratio and exit pupil position. The first OAP 
images the telescope pupil onto a 12 x 12 actuator MEMS DM. 

After reflection off the DM, the UV laser light is selected off with a UV dichroic 
mirror and refocused to a 5’’-field stop. The light is collimated with a reversed 
OAP to cancel out coma from the first OAP. The collimated light passes through 
a dual-crystal G-BaB2O, Pockels cell between two crossed linear polarizing cubes. 
The Pockels cell is used as a high speed shutter to limit the backscattered Rayleigh 
laser light to a range of 450m around the 10-km beam waist. An 11 x 11 lenslet 
array is located at a pupil and the Shack—Hartmann pattern is demagnified onto 
an E2V UV optimized CCD39 detector (80 x 80 pixels; 72% quantum efficiency 
at A = 350nm). The pixels are binned by a factor of 3 and the slope of each sub- 
aperture is calculated from 2 x 2 binned pixels. The AO control loop operates at the 
1.2kHz frame rate of the detector, with an effective control bandwidth of 90-100 Hz. 

The visible and infrared light passes through the UV dichroic and is refo- 
cused by another OAP. The light is then relayed by a second OAP relay 
which includes a tip-tilt corrector and an atmospheric dispersion corrector (ADC; 
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400 nm < \ < 1.8m). The final relay element creates a telecentric F/41 beam that 
is split by a visible dichroic mirror at A = 950nm with the infrared light directed 
via a fold mirror to an external camera port. 

The visible light is captured by an EMCCD (E2V CCD201-20) camera with a 
44” square field of view and 0.043” pixel scale. Frames are continually read at a rate 
of 8.6 Hz during science observations, allowing image displacement, which cannot 
be measured using the laser system,*° to be removed in software based on the 
position of a my < 16 guide star within the field of view. The EM gain is set before 
observations based on the target’s magnitude to minimize read-noise while leaving 
sufficient dynamic range if the targets are brighter than anticipated. Typically, an 
EM gain of 300 (readnoise < 0.2 e~) is appropriate for targets my > 13, and is 
incrementally decreased to 25 (readnoise = 1.9 e~) for targets as bright as my = 2. 


5.2. Software 


The Robo-AO software was designed from the outset as a modular system;*! this 
means that each of the various interfaces to hardware were built as individual mod- 
ules to control a single function of the total system. This modular design allows the 
individual subsystems to be stacked together into larger modules. For example, the 
WFS camera is controlled through a dedicated interface module, which is then com- 
bined with the modules for the DM, AO reconstructor, tip-tilt mirror control, and 
other modules into the adaptive optics control software. A master robotic sequencer 
is then used to manage the operations of the telescope, adaptive optics system, laser, 
filter wheels, and science camera, executing all operations that otherwise would 
have been performed manually, allowing immensely improved observing efficiency. 
All of the Robo-AO software is written in C++, and each module includes a small, 
standalone test program to ensure proper function of the module. 

The execution of an observation starts with a query to an intelligent queue 
scheduling program*? that selects a target. The robotic sequencer will then point 
the telescope, while simultaneously selecting the appropriate optical filter and con- 
figuring the science camera, laser and adaptive optics system. A laser acquisition 
process to compensate for differential pointing between the telescope and laser pro- 
jector optical axes, caused by changing gravity vectors, begins once the telescope 
has completed pointing at the new target. A search algorithm acquires the laser 
by moving the uplink steering mirror in an outward spiral pattern from center 
until 80% of the wavefront sensor subapertures have met a flux threshold of 75% 
of the typical laser return flux. Simultaneous with the laser acquisition process, 
the science camera is read out for 10—-20s with no adaptive optics compensation 
and with the deformable mirror fixed to obtain a contemporaneous estimate of the 
seeing conditions through the telescope. For targets brighter than my = 12, the 
telescope is offset to center the star on the detector; for fainter stars the robotic 
system will only recenter if the brightest star is within the center quarter frame, 
thus avoiding centering on a bright companion star near the edge of the frame 
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and pushing the fainter science star out. Upon completion of laser acquisition, a 
new wavefront sensor background image is taken, the adaptive optics correction is 
started and an observation with the science camera begins. 

During an observation, telemetry from the adaptive optics loop is used to main- 
tain telescope focus and detect significant drops in laser return flux. Slow drifts in 
the focus mode of the deformable mirror are measured and offloaded to the sec- 
ondary mirror to preserve the dynamic range of the deformable mirror. Focus on 
the deformable mirror is measured by projecting the commanded actuator values 
to a model Zernike focus mode. A median of the last 30 focus values, measured 
at 1-s intervals, is calculated; if the magnitude of this value exceeds 220nm peak- 
to-valley surface of focus on the deformable mirror, equivalent to a displacement 
of 20 wm of the Palomar 1.5-m telescope secondary mirror, then the secondary is 
commanded to change focus to null out this value. Focus corrections may not be 
applied more than once every 30s and are restricted to less than 50 um of total 
secondary motion to avoid runaway focus. The laser return flux is also measured 
at simultaneous 1-s intervals, and if the laser return drops below 50 photoelectrons 
per sub-aperture on the wavefront sensor for more than 10% of the values used 
to calculate the median focus, e.g., due to low-altitude clouds or extremely poor 
seeing (greater than 2.5”), any focus correction is ignored due to the low certainty 
of the measurement. Additionally, if the return stays below 50 photoelectrons per 
sub-aperture for five consecutive seconds, the observation is immediately aborted, 
the target is marked as “attempted but not observed” in the queue, and a new 
target is selected for observation. 

The intelligent queue is able to pick from all targets in a directory structure 
organized by scientific program, with observation parameters defined within Exten- 
sible Markup Language (.XML) files. Users are able to load targets directly, or 
through text file interpreters or a web management system. The queue uses an 
optimization routine based on scientific priority, slew time, telescope limits, prior 
observing attempts, and laser-satellite avoidance windows to determine the next 
target to observe. In coordination with US Strategic Command (USSC), we have 
implemented measures to avoid laser illumination of satellites. To facilitate rapid 
follow-up observations we have developed new de-confliction procedures which uti- 
lize the existing USSC protocols to open the majority of the overhead sky for 
possible observation without requiring preplanning. By requesting predictive avoid- 
ance authorization for individual fixed azimuth and elevation ranges, as opposed to 
individual sidereal targets, Robo-AO has the unique capability to undertake laser 
observations of the majority of overhead targets at any given time. 

A system monitor manages the information flow of the status of the individual 
subsystems that comprise the entire robotic system. It detects when one of the 
software subsystems has an error, crashes, or has other issues that might hinder 
the proper operation of the system. Errors are logged and the operation of the 
automated observations is stopped until the subsystem daemon can clear the issue. 
If the subsystem cannot correct the error, the automation system can take steps, 
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up to and including restarting subsystems, in an attempt to continue operations. If 
it is unable to restart the system, it shuts everything down, leaving the system in a 
safe state. 


5.3. Data reduction and analysis 


Upon completion of an observation, the data are compressed and archived to a sepa- 
rate computer system where the data are immediately processed. A data reduction 
pipeline*® corrects each of the recorded frames for detector bias and flat-fielding 
effects, and automatically measures the location of the guide star in each frame. The 
region around the star is up-sampled by a factor of four using a cubic interpolation, 
and the resulting image is cross-correlated with a diffraction-limited point spread 
function for that wavelength. The frame is then shifted to align the position of 
greatest correlation to that of the other frames in the observation, and the stack of 
frames is coadded using the Drizzle algorithm to produce a final high-resolution 
output image sampled at twice the resolution of the input images. For science 
programs which require the detection and contrast ratio measurement of closely 
separated objects, an additional point-spread-function (PSF) subtraction and anal- 
ysis pipeline can be started upon completion of the data reduction pipeline. This 
pipeline distinguishes astrophysical objects from residual atmospheric and instru- 
mental wavefront errors and corresponding speckles in the image plane using a 
modified Locally Optimized Combination of Images** algorithm. The algorithm 
selects a combination of similar PSFs from the hundreds of other observations of 
similar targets during that night. The combination of PSFs is used to create a 
model PSF which is then subtracted from each image, and potential companions 
are flagged. 

The rate at which final images are processed lags only slightly behind the data 
capture rate: a full night’s set of data is typically finished well before the next 
night of observing. Example cutouts of fully reduced Robo-AO images showing 
multiple stars appears in Fig. 4. Data are available as full-frame, cutout and PSF- 
subtracted images, with estimates of achievable contrast versus separation, and can 
be downloaded over scp or through the use of wget scripts. 


6. Automated AO Results 


On August 14, 2011, we closed the the high-order AO loop on Robo-AO for the 
first time, achieving a clear diffraction-limited core in the fast frame rate visible 
camera images. Over the following year, we improved the operation of the system 
with a focus on the master robotic sequencer software. On June 19, 2012, Robo- 
AO demonstrated fully automated operation by observing 125 objects in succession 
with no human assistance. Initially, AO setup overhead times were on the order 
of 60s (excluding telescope slew time). With further software optimization, this 
was reduced to ~40s in 2014; and, with a change in the power switching module 
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Fig. 4. Example Robo-AO images of Kepler candidate exoplanet host stars with stellar blends 
that contribute additional light to the measured light curves. 


that controls the operation of Pockels cell shutter, this is currently around ~20s. 
By 2015, Robo-AO achieved total overhead times of less than 40s on the Palomar 
1.5 m for telescope slews on the order of a few degrees or less. Across all observations, 
the average overhead of the Robo-AO system at Palomar, including telescope slews 
and the setup of all instrument operations, was about 80s. In comparison, large 
telescopes can take several minutes to configure only their AO system. This was 
a significant leap in observing efficiency, and has allowed Robo-AO to complete 
scientific surveys of thousands of stars at high resolution that were not achievable 
at large telescopes due to the lack of available telescope time. 

We operated Robo-AO in its automated mode for ~180 nights at Palomar, 
spanning June 2012 to June 2015. On nights with no weather or environmental 
losses, Robo-AO typically completed 200-250 observations, each 90-120s of total 
integration time (sufficient to get to the photon noise floor set by the uncorrected 
seeing halo in the first few arcseconds of separation). During its time at Palomar, 
the system completed ~19,000 observations (see Fig. 5), comprising several AO sur- 
veys with the most numerous observations ever performed.*” This includes known 
stars within 25 pc observable from Palomar and nearly all of Kepler’s candidate 
exoplanet host stars. For the latter, we discovered blended stars within 4” for 
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Fig. 5. High-angular-resolution visible-light AO imaging performed by Robo-AO. Each of the 
18,550 points on this graph is an observation performed by the robot and automatically processed 
by the pipeline into a final science-quality image. 
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Fig. 6. Separations and magnitude differences of the detected companions in the full Robo-AO 
Kepler Planetary Candidate Survey.> ° 


559 of 3857 Kepler host stars, yielding a nearby star fraction of 14.5 + 0.6% (see 
Fig. 6). Approximately half of these nearby stars are within 2”, a separation range 
where only high-angular resolution surveys are able to accurately measure the prop- 
erties of the companion stars. 
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7. Future Directions 


In October 2015, we relocated the Robo-AO system to the Kitt Peak 2.1-m telescope 
for an extended deployment.*® The system was augmented with a near-infrared 
avalanche photodiode array camera, enabling simultaneous imaging with the visible 
camera. While visible and infrared tip-tilt correction were demonstrated previously, 
these new modes will be wrapped into the current robotic observing software. 

Future Robo-AO systems will be deployed at other modest sized telescope aper- 
tures, including the University of Hawai‘i 2.2-m telescope at Maunakea where the 
superior seeing will naturally enhance the achieved image quality of observations — 
and where we can take advantage of more recent component advancements, e.g., 
MEMS DMs with a greater number of actuators and EMCCD wavefront sensor 
detectors.4” In addition to imaging, autonomous AO can play a crucial role in 
enhancing the sensitivity of low spectral resolution instruments for rapidly charac- 
terizing transients and other time-domain phenomena.*® This will be particularly 
critical for the larger telescope apertures necessary to fully characterize the wealth 
of new discoveries in the era of the Large Synoptic Survey Telescope. 
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In this chapter, the main ideas and concepts of ground layer adaptive optics 
(GLAO) are presented. The astronomical drivers for such systems are outlined, 
as well as some performance metrics associated to this AO system type. Past and 
current systems (demonstrators and “production” instruments) are presented as 
well, showing salient results obtained with them. 


1. GLAO: General Principles 


The principle of ground layer adaptive optics (GLAO,! see Ref. 2 for a review) 
is based on the observation that at most astronomical observation sites, a large 
amount of atmospheric turbulence is concentrated near the ground. The ground 
layer of turbulence may contain as much as 70-80% of all the turbulence, depending 
on conditions, site, and also how high one considers this layer to extend. Note that 
some of this turbulence can be very local (even inside the dome), while some is truly 
atmospheric (boundary layer). Measuring and correcting only this turbulence close 
to the ground is the key to GLAO. 

Figure 1 shows the distribution of turbulence measured at low altitudes above 
the telescope (below 300m, 600m, 900m, and 1.2km).° We can see that in Paranal, 
for example, at least half of the turbulence is typically located below 600m most of 
the time, sometimes much more. 

Since the turbulence is close to the ground, the isoplanatic angle associated 
with it is very large. This means that correcting this turbulence improves the astro- 
nomical image over a much larger field of view than compensating the full turbu- 
lence above the telescope. On the other hand, since only a fraction of turbulence 
is corrected, one cannot expect a perfect removal of the effects of the atmosphere; 
therefore the images will be only partially corrected. 
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Fig. 1. Distributions of fraction of turbulence below an altitude h (in meters), from Ref. 3. 


It may seem counter-intuitive to wish to correct only a part of the atmospheric 
turbulence. However, GLAO is particularly suited for some astronomical applica- 
tions: 


e GLAO offers a much wider corrected field of view compared to single conjugate 
adaptive optics*® (SCAO, the “usual” form of AO), for example. Fields of several 
arcminutes can be corrected, as opposed to a few tens of arcseconds with “con- 
ventional” AO. Compared to another wide field AO solution, multi-conjugate 
AO (MCAO), a GLAO system is simpler, as there is only one deformable mirror 
(DM), and the corrected FOV can be larger than what a “reasonably complex” 
MCAO system can currently provide. 

e The correction quality is very homogeneous in the whole science field of view and 
shows much fewer signs of isoplanatism compared to SCAO. This may facilitate 
astronomical data processing, as routines used for seeing-limited images can still 
be used, without having to worry about diffraction effects. 


GLAO is sometimes called a seeing enhancer. Indeed, most GLAO systems 
do not provide diffraction limited images, but concentrate the energy in the point 


“See Chapters 13 and 14. 
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spread function, which does not display the usual diffraction-limited characteristics 
of conventional AO. 


2. Principles of GLAO 


Two main techniques have been used to measure only the turbulence near the 
ground: 


e Multiple guide stars (natural or laser), located far apart from each other on the 
sky. Averaging their measurements allows one to select the common aberrations 
between the guide sources, and therefore to filter out the high-altitude turbulence, 
only measuring the ground component. Other forms of signal processing to filter 
the lowest layers are also possible (e.g., tomography). 

e A low altitude single laser guide star (a Rayleigh guide star) can also filter out the 
high-altitude turbulence, which is located above the altitude of the laser guide 
star (LGS), and can be used for GLAO. 


Figure 2 summarizes two versions of a multi-wavefront-sensor (WFS) GLAO 
system. In the first case (on the left), natural guide stars (NGS) are used. Each of 
them is observed by a dedicated wavefront sensor. The measurements are sent to 
a real-time computer (or wavefront controller [WFC]), which calculates the com- 
mands to be sent to a deformable mirror. In the second approach, LGSs are used 
instead of NGSs, allowing high flux reference sources wherever in the sky one needs 
them. The figure visually shows how low altitude layers are well sensed by the 
wide guide star constellation with a lot of overlap between each WFS measurement. 


Laser Guide 
Reference Stars 


High 
Altitude 
Layer 


Ground 


Layer Telescope 


Telescope 


Ground conj. DM 


Camera 


Fig. 2. Natural guide star GLAO on the left, laser guide star GLAO on the right (courtesy 
E. Marchetti, ESO). WFC represents the Wavefront Controller, WFS the wavefront sensor. See 
electronic edition for a color version of this figure. 
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The high-altitude layers, on the other hand, provide a different measurement on each 
WES, which then averages out in the wavefront reconstruction. Adding more guide 
stars provides a better averaging, but 3-4 guide stars are adequate in most cases. 

Note that because LGSs do not measure tip-tilt (TT), a dedicated TT (ie., 
natural) star is needed. It can, however, be anywhere in the large field and can be 
fainter than in a natural guide star (NGS)-system, as only TT is corrected. Assuming 
a large field GLAO, finding a TT star is less of a challenge than in smaller field AO 
concepts, and hence the sky coverage of GLAO is high (or even full). A variant of 
this scheme uses the telescope’s own TT sensor (if the telescope happens to have 
such a star to compensate for wind shake, sometimes called field stabilization). 

The optimal position of the guide stars in a multi-guide star GLAO system is 
related to the field of view one wants to correct. In an LGS system, for practical 
reasons, one usually wants to place the LGSs at the edge of the science field. The 
wider the field of view one wants to correct, the wider the asterism required. If 
the guide stars are too close to the field center, the performance peaks towards the 
guide stars and will not be as uniform as possible. 

Since the image quality that one wants is not the diffraction limit, constraints 
on the spatial frequencies measured and corrected by the AO system are relaxed. 
This means that one does not need as many sub-apertures on the WFS and DM 
actuators as one would for a full diffraction-limited system. In order to dimension 
these components, simulations are required to exactly define what order system one 
wants. 

Most GLAO systems and studies have used Shack—Hartmann wavefront sensors. 
This choice is mostly due to its linearity properties (as GLAO correction is low, the 
WES should work well in quasi open-loop), but also to convenience and “habit”. 

One can wonder if LGSs are necessary with GLAO, or whether one can live 
with NGSs only, and still have good sky coverage. It turns out that here, too, the 
performance that one wants to achieve drives the choice of guide star. If one wants a 
large (several arcmin) FOV, NGSs seem to provide an adequate source of references. 
The wide field allows one to find several suitable stars, and the sky coverage is almost 
full, anywhere in the sky. On the other hand, for smaller fields LGSs seem to be 
a better choice, as the number of stars in such a field can be limited, constraining 
the sky coverage. However, using NGSs also has drawbacks (one needs to find them 
and direct their light to the WFS — meaning moving parts — and they also make 
the performance variable from pointed field to the another, as the NGS position is 
not fixed). The drawback of lasers is mainly related to cost and complexity, as well 
as possible interactions with surrounding telescopes. 


3. Astronomical Motivations 


The astronomical motivations for GLAO are numerous, but most revolve around 
increasing observational efficiency by improving seeing. Two main effects are seen 
compared to seeing limited observations: 
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e A (mild) decrease of the full width half maximum (FWHM) of the image. Depend- 
ing on the corrected FOV, this is typically between 30% and a factor of 2 decrease 
in image size. So here, the goal is an improvement in resolution. 

e Associated with the shrinking of the Point Spread Function (PSF) is an increase 
in the signal-to-noise ratio (SNR) of the observations for a given integration time. 
This can be a non-negligible effect, since the integration time can be shortened 
by typically up to a factor of 2. For very long exposures, such as a MUSE deep 
field on the VLT (total exposure of about 80h), GLAO is an enabling technology. 
Indeed, a total integration time of ~80h is still in the realm of feasibility, while 
160h is much more challenging. 


It should be noted that considerable effort is made to find sites for astronomical 
observatories with the best seeing. It seems then intuitive that a technology that 
improves the seeing should be desirable. This technology could however be seen more 
as a telescope system (like active optics for example) rather than an independent 
system (like AO). 

Clearly, GLAO is not aiming at the same niche as a conventional diffraction- 
limited AO system. Rather, it improves the seeing-limited capabilities of an instru- 
ment, by reducing the image’s FWHM (slightly better resolution) and concentrating 
the light (reducing the integration time to reach a similar SNR). It can be seen as 
being equivalent to virtually placing the telescope on a better site — instead of 
removing most of the atmospheric turbulence like a conventional AO does. 


4. On the Importance of Performance Metrics 


Several metrics have been used to characterize the performance of a GLAO system. 
Indeed, since GLAO provides only a moderate correction, Strehl ratio, usually used 
in AO, may not be very useful, since the diffraction limit is not achieved. Other 
parameters have been used instead: 


e FWHM of the PSF. This is perhaps the most intuitive metric, but is some- 
times difficult to measure, especially since GLAO also modifies the shape of the 
PSF (and hence the quality of the fitting function may change between AO and 
non-AOQ). 

Encircled (or ensquared) energy within a certain box (or disk) size (in arcseconds). 
Size of the box (or disk) containing 50% of the energy. 

Parameters of a fit of the PSF — for example a Moffat function. 

Ratio of one of these parameters compared to a non-AO observation to express 
a gain compared to no AO. 


Choosing the right metric to quantify the system’s performance is important, 
because it allows one to estimate what configuration (correction field of view, wave- 
length) best fulfills the requirements of the astronomical observations. For exam- 
ple, a spectroscopic instrument is more sensitive to the energy going into a pixel 
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(or spaxel) than to the FWHM of the PSF. On the other hand, an imager may 
be best described by the size of the 50% energy disk rather than the FWHM of 
the PSF. 

Note that depending on the correction, the shape of the PSF may differ from 
the usual seeing-limited profile (or a diffraction-limited image). Usually, the GLAO 
PSF is more peaked than the seeing-limited one. 


5. Observational Efficiency Considerations 


Since one of the key benefits of GLAO is improving the seeing, and therefore obser- 
vational efficiency (amount of time it takes to achieve a given SNR), it is critical 
that a GLAO system be efficient. Indeed, if the GLAO system improves the seeing, 
but significantly degrades the transmission by adding many optical elements, or 
decreases observational efficiency by increasing overheads to acquire the objects 
and close all the AO loops, GLAO’s advantages may be severely reduced or may 
even disappear. Therefore, several points must be dealt with carefully: 


e The optical train of the GLAO system must be as efficient as possible. One 
solution to this is to use an adaptive secondary mirror. This simplifies the optical 
design (no need for optical relays with several optical surfaces degrading trans- 
mission) and can allow light to enter the science instrument directly without 
additional optics. Note that for very large fields of view, the conjugation height of 
the secondary mirror (constrained by the optical design of the telescope) becomes 
important, since it is the distance between its conjugation height and the ground 
layer of turbulence that dictates how well the GLAO is able to correct turbulence. 
The technological complexity of an adaptive secondary mirror also needs to be 
taken into account, as it may influence observational efficiency. 

e The set-up time of each GLAO observation must not be a significant overhead 
compared to non-AO cases. This means that setting up the (multiple) wavefront 
sensors, acquiring (multiple) guide stars (or laser or natural), and starting all 
AO loops should not take much longer than acquiring a science target in non-AO 
mode. This can prove to be a significant challenge considering the extra complex- 
ity brought by a GLAO system, and makes a high level of task automatization a 
must. 

e The reliability of the AO system hardware and software must be high, as the 
GLAO gain is fully achieved only if the AO system is (nearly) as reliable as a 
seeing-limited one. 


6. GLAO: Performance and Limitations 


The performance of GLAO has been simulated both analytically* and numeri- 
cally (e.g., Refs. 5-7). Both approaches agree very well,® which is understandable, 
as GLAO is mainly sensitive to the GL quantity. We summarize here the main 
tendencies found in those studies. 
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The wider the corrected field of view (i.e., position of the guide stars), the 
thinner the slab of corrected atmosphere is. For example, in GRAAL, the GLAO 
system for the near-infrared HAWK-I instrument on the VLT, an 8’ FOV is cor- 
rected, yielding a correction thickness of only about 300 m. On the other hand, 
the GALACSLWFM (the AO system for MUSE), also on the VLT, only corrects 
a l’ field in the visible, and this system compensates roughly the first kilometer 
of turbulence. This is quite a difference, which can grow even larger for ultra-wide 
field GLAO systems like IMAKA (see below). 

Note also that with increased field of view, the correction quality decreases, as 
a thinner and thinner slab of turbulence close to the ground is corrected. 

An example of simulated performance is shown in Fig. 3, for the GALACSI 
wide field mode on the VLT. We can see that in the several cases simulated 
(different seeing, each associated with a different C? profile?), a gain of a factor 
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Fig. 3. Summary of the WFM performance for three seeing conditions. Each point represents 
a direction in the corrected scientific field of view. From top left to bottom right: 1.1/’, 0.85” 
and 0.6”. 


bo2 is the square of the spatial correlation function and is related to the index of refraction 
structure constant; see Section 3 of Chapter 13. 
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of ~2 in the adopted metric (Ensquared Energy in a 0.2” pixel at 750nm) is 
obtained. 

These plots show that the uniformity of the gain is constant across the scientific 
FOV. Performance (in terms of gain) is only slightly dependent on the seeing value 
in the specified range (0.6” to 1.1”). 

Since GLAO is so dependent on the structure of the C? profile, and in particular 
turbulence near the ground, it is critical to understand the ground layer structure 
at the site of the GLAO system. Several instruments can be used for such site test- 
ing campaigns, depending on what thickness, one wants to investigate: generalized 
SCIDAR,® SLODAR,!° and MASS-DIMM"! all provide useful information on this 
aspect. Since one of the goals of GLAO is to increase the observational efficiency of 
the whole telescope, it is important to know the statistical behavior of the ground 
layer turbulence. Indeed, one wants to know what fraction of the total observing 
time, one is able to get a significant improvement of the telescope’s image quality 
with GLAO. This requires rather long site survey campaigns. It is important that 
this knowledge covers a long timeline, to be statistically significant. 

In the case of AO demonstrators, usually the observation campaign is short, 
which does not really allow the user to estimate how much gain a GLAO system 
provides in a statistically significant amount of nights. One must therefore rely 
heavily on simulations to explore a large number of possible observing conditions 
to make sure the system behaves as specified most of the time. 

Another hurdle in understanding the C? structure is local effects. Indeed, most 
site surveys are done with auxiliary (small) telescopes. Therefore, they do not nec- 
essarily measure the same turbulence as the one seen by the main (astronomical) 
telescope. Since the ground layer can be so strong, a few meters of difference in 
height can make a significant difference in the measured amount of turbulence. This 
is why some site survey telescope are designed to be on towers that simulate the 
height of the astronomical telescope — which elevates the main telescope’s height to 
10m or more. Local effects, like increased turbulence due to turbulent flow behind 
a large telescope, can also affect ground layer turbulence measurements and provide 
a biased view of how much a GLAO system will effectively improve the telescope’s 
image quality. 

A further possible source of bias between the site survey results and the actual 
turbulence seen by a GLAO system is the dome of the astronomical telescope. 
Indeed, it is unclear as of today, whether ground layer turbulence “enters” the 
dome, or if the dome protects the telescope from this. This has a significant impact 
on the GLAO performance. Indeed, if the turbulence is filtered by the dome, the 
gain of GLAO compared to the site-survey measurements can be reduced. 

Note that GLAO, as any AO system, can also correct slow telescope aberrations, 
which can be a significant contributor to the gain of GLAO compared to seeing 
limited (non-AO) operations. 
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7. GLAO: Wavefront Reconstructors 


The simplest way to correct for ground layer, as explained previously, is just to 
average the measurements from multiple WFSs observing different stars. The com- 
mon part of turbulence (low altitude component, where the WFS measurements are 
nearly identical or at least very correlated) between the multiple stars will be cor- 
rected, whereas the non-common parts (higher up) will be averaged out. Theoretical 
work has been carried out to investigate whether other reconstruction algorithms 
would bring a gain in GLAO (e.g., see Ref. 12 for a tomographic approach, Ref. 13 
for a numerically efficient method). Tomography seems to bring a moderate perfor- 
mance gain in the studied case, at a cost in corrected PSF homogeneity. The larger 
the corrected field of view, the smaller the gain. A gain in computational complexity 
on the other hand is possible, for a similar performance to averaging. 

Note that for minimal Real Time Computer complexity, it is possible to average 
the slopes of each WFS instead of entering all the slopes into a larger reconstruction 
matrix. Let Nyrs be the number of wavefront-sensors (typically 3 or 4) and Nglopes 
be the number of slopes from each WFS and Nact the number of commands to be 
computed. The gain in necessary computing power can be considerable, as instead 
of needing to compute a (Nwts x Nelopes) X Nact command matrix, one can use an 
Nolopes X Nact one. Note that in this scheme, all WFSs need to be identical, as the 
interaction matrix is not large enough to differentiate between them. If the WFSs 
have different alignments (for example), those cannot be taken into account if one 
cannot write them as differential reference slopes (one can apply offsets to each WFS 
in slope space, but cannot have different interaction matrices for each WFS). This 
may constrain the opto-mechanical design of the GLAO system by requiring more 
stiffness from the WFSs, for example. This shows how in an AO system, different 
choices can lead to very different components, and underlines the need for a trade-off 
analysis at the AO system level. 


8. Existing and Planned Systems, Results on Sky 


8.1. Demonstrators 


The first on-sky demonstration of GLAO correction was made with the MAD 
demonstrator,'*!° using 3 NGSs in variable 1’ to 2’ constellations, in the near 
infrared. This showed for the first time that averaging wavefronts from different 
directions indeed improves the image quality provided by an AO system. Unfortu- 
nately, an NGS-based system with an only 2’ field of view and WFS detectors with 
non-negligible read-out noise did not allow for a very large sky coverage, reducing 
the astronomical science that could be done. This demonstrator underlined the 
need for LGSs if a high number of science targets are to be observed with such a 
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Fig. 4. MAD demonstrator Strehl ratio (K-band) in SCAO mode (a) and GLAO mode (b). From 
Ref. 14. See electronic edition for a color version of this figure. 


small field of view (for GLAO). This system also compared two wavefront sensing 
approaches, one based on Shack—Hartmann, and another based on pyramid sensors. 

Figure 4 shows a salient result comparing the SCAO and GLAO performance 
in the K-band. The SCAO system delivers a corrected field of view of ~20”, with 
a high Strehl towards the guide star (35%, the red area towards the right edge of 
the image), whereas the GLAO provides a much larger field of about 1’, but with a 
lower Strehl (about 20%). 

Further demonstrators at other sites, like CANARY!® on La Palma and 
RAVEN!” !8 on Mauna Kea, have confirmed that GLAO does bring performance 
very close to the predicted one, and that the usefulness of GLAO is not limited to a 
particular site, but seems to be a general property of any astronomical observatory. 

Figure 5 shows the impact of the C? profile on the performance of a GLAO 
system, as seen on the RAVEN demonstrator.’ We can see that if the ground 
layer is very strong (top row), the GLAO correction does almost as well as a multi- 
object AO (MOAO) system (which corrects the full turbulence in a given direction, 
using tomography). When the turbulence is more distributed in altitude, the GLAO 
system does not do as well, although it still improves the image quite a bit compared 
to no AO. 

The first GLAO system based on multiple laser guide stars was demonstrated 
by Ref. 19 on the MMT, using Rayleigh lasers. The PSF’s FWHM was reduced by 
roughly a factor of 2, from 0.7” to 0.33”, also in the near-infrared. The corrected 
field of view was about 2’. Figures 6 and 7 show the improvement in image quality 
observed with GLAO on the multiple mirror telescope (MMT), both in terms of 
FWHM in open and close loops, as well as a comparison of the corrected and 
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Fig. 5. Impact of C? profile on the performance of a GLAO system, as seen on the RAVEN 
demonstrator. From Ref. 17. See electronic edition for a color version of this figure. 
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Fig. 6. On-sky GLAO performance at MMT.?! 


uncorrected PSF's. We can see that the gain brought by GLAO is obvious, prompt- 
ing the development of the ARGOS Rayleigh laser-based AO system on the large 
binocular telescope (LBT).”° 

It is possible to build a GLAO system correcting an extremely large field of 
view without an adaptive secondary. This is demonstrated by the IMAKA system.?? 
Indeed, IMAKA corrects a 1/3 of a degree field of view, with a post-focal system. 
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Fig. 7. On-sky GLAO performance at MMT.?? 


Although the image quality improvement over such a large field is not necessarily 
very large, it is still deemed useful, especially if there is a large amount of turbulence 
close to the ground (or even in the dome, as may be the case in older telescopes). 


8.2. Astronomical systems 


Although the “mainstream” GLAO systems rely on multiple guide stars to differen- 
tiate the turbulence close to ground (to be corrected) from the upper altitude (not 
to be corrected) by averaging their wavefront measurements, another approach is 
possible, as demonstrated by the SAM system on the SOAR telescope.?* The SAM 
system corrects a field of view of 3’ in the visible. The idea is to use a Rayleigh 
LGS (in this case a UV laser, which does not have the eye-safety issues of sodium 
LGSs and is also much cheaper than a sodium counterpart), which produces a guide 
star at an altitude of 10-15km (instead of the ~90 km of the “conventional” sodium 
LGSs). Because of the large cone effect (i.e., focal anisoplanatism), the high-altitude 
layers are very poorly measured or not sampled at all. Therefore, mostly the layers 
close to ground are measured and corrected. The simplicity of a single guide star is 
very attractive, and therefore this approach can be a fruitful one, as demonstrated 
here. In theory, a single guide star does not provide as good PSF homogeneity over 
the field as multiple guide stars, but this effect can be small (see Fig. 8). 

The ARGOS system at the LBT?° uses Rayleigh guide stars, provided by pulsed 
green lasers. Both of the LBT’s 8-m “eyes” is equipped with three LGSs, allowing 
them to feed both foci simultaneously with GLAO. The laser systems drive the 
adaptive secondaries on both eyes. The system aims at improving the FWHM of 
the corrected image by a factor of 3-4 in the infrared corrected field of view of 4’ 
(see Fig. 9). 

ESO’s adaptive optics facility (AOF?°) is an upgrade to the VLT, comprising 
four sodium laser guide stars, a deformable secondary mirror with 1170 actuators, 
and three AO systems, of which two are GLAO. The first GLAO system, GRAAL, 
offers a very large, 8’ corrected field of view for the near infra-red HAWK-I imager. 
The second system is the GALACSI wide-field mode (WFM), which offers a 1’ field 
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Performance of the SAM single LGS GLAO system, at two different times. Note how 


much the performance changes, a sign that the amount of turbulence in the ground layer is not 


always constant. From Ref. 24. 


Fig. 9. 


courtesy of the ARGOS consortium. 


Image of the Large Binocular Telescope’s laser system, ARGOS, used for GLAO. Image 


of view in the visible. These systems are now fully operational, and produce astro- 


nomical results, demonstrating the maturity of GLAO. 


Figures 10 and 11 (which are not final, since they are not yet statistically 
complete) show that GLAO does exactly what it is planned to: improve statistically 
the image quality compared to no AO. Depending on the corrected field of view, 
this improvement can be almost of a factor of 2 (GALACSI WFM, 1’ corrected field 
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Fig. 10. Preliminary GRAAL performance summary, with and without AO.?° See electronic 
edition for a color version of this figure. 
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Fig. 11. Preliminary GALACSI WFM performance summary, with and without AO.?° See elec- 
tronic edition for a color version of this figure. 


in the visible), to about 20-30% (GRAAL, 8’ FOV, in the IR). These results agree 
very well with numerical simulations. 


9. Future of GLAO 


Only longer-term studies will show how useful GLAO will be in observatories. It 
seems that some instruments at least will greatly benefit from this system, which 
will be used in almost all observations instead of seeing-limited ones, which is a 
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testament to its usefulness. The next generation of ground-based telescopes, like the 
Thirty-Meter Telescope, The Giant Magellan Telescope and the Extremely Large 
Telescope are all considering using GLAO, in one form or another, demonstrating 
the large potential of this form of AO. 

Some other future applications can also be envisioned for GLAO. For example, 
it can be a very efficient first stage corrector for an MOAO system. One could 
imagine that a telescope’s “AO” sensors are used for a coarse GLAO correction. 
After this, the beam would be fed to the MOAO system, which would have its own 
deformable mirrors. However, these mirrors have usually limitations in the stroke 
they can achieve. Using a pre-correction by GLAO on a DM included in the telescope 
could prove to be very useful. 
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Chapter 17 


Multi-Object Adaptive Optics 


Donald Gavel 


Center for Adaptive Optics, University of California Observatories, 
UC Santa Cruz, Santa Cruz, CA 95060, USA 


This chapter covers a methodology that widens the field of regard for simultaneous 
adaptive optics (AO) correction in both imaging and spectroscopy. Rather than 
directly trying to correct a whole volume of turbulence (as would be done by a 
multi-conjugate AO (MCAO) system), we correct integrated turbulence individu- 
ally along paths in a set of directions. These directions can be toward the science 
targets of interest, in which case the “multi-object” AO (MOAO) is simply a repli- 
cation of single-conjugate AO (SCAO) systems, and we are simply multiplexing 
the telescope time. This kind of system could be used, for example, in separately 
correcting many galaxies in a galaxy cluster that is covered by the telescope 
field, but which are spaced from each other by larger than the isoplanatic angle. 
The MOAO concept opens additional possibilities as well: for example, correcting 
the multiple tip/tilt stars that are a necessary part of a MCAO system, which 
allows using dimmer natural tip/tilt stars. We discuss the MOAO architecture, 
the design requirements as driven by the turbulence structure of the atmosphere, 
and then describe a number of ongoing studies and experiments with MOAO 
system concepts. 


Introduction 


Multi-object adaptive optics (MOAO) is an adaptive optics architecture that pro- 
vides separate AO-corrected beams for each of several astronomical science objects 
over a wide field. There are two important applications where MOAO is of interest. 
The first is to service a multi-object spectrograph, where the observations consist of 
taking simultaneous spectra of several objects over a wide field. The AO-corrected 
beams are optimal for their respective directions in space (positions on the field). 
The corrected images are fed to slits in a slit-mask, spectrograph fibers, or an 
integral field unit. The second interesting application for the MOAO concept is in 
support of a multiple laser guide star (LGS) AO system, where MOAO units correct 
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the beams of multiple tip/tilt stars. Just as a single-LGS AO system requires a nat- 
ural star to measure the tip/tilt mode unsensed by the laser, multi-LGS systems 
require multiple natural tip/tilt stars to sense the modes left unsensed by the LGS 
constellation. MOAO units apply AO corrections to the tip/tilt stars, which allows 
the use of dimmer stars for a given level of tip/tilt sensing performance, which in 
turn greatly improves the sky coverage of the overall AO system. 

MOAO is an advanced concept that is presently in its infancy for implemen- 
tation. A “multi-button” AO system, called FALCON,! was first proposed for use 
on the VLT telescope in support of multi-object infrared astronomy. This later 
evolved into the CANARY demonstrator project, which is a pathfinder for a poten- 
tial European Extremely Large Telescope (E-ELT) instrument called EAGLE.” 
A multiple AO-fed spectrograph, called IRMOS, was considered in a design study 
for second-light instrumentation for the Thirty Meter Telescope (TMT).*:4 The 
Keck telescope did a design study for the Keck Next Generation Adaptive Optics 
(KNGAO),° which used MOAO units to correct tip/tilt stars. As of this publication, 
only the demonstrator, or “pathfinder” systems have been tested on sky: CANARY 
at the William Herschel Telescope,? RAVEN at the Suburu Telescope,° and a single 
MOAO unit called VILLAGES at Lick Observatory.” 

Figure 1 compares the system architectures of multi-conjugate adaptive optics 
(MCAO) and MOAO systems. The active wavefront correction in MOAO differs sig- 
nificantly from other AO systems, in that it requires open-loop control, which means 
that the correction applied to MOAO deformable mirrors is not obtained from sens- 
ing guide star light that has reflected off of those mirrors. Instead the correction is 
derived from a system-wide multiple laser guide star tomographic wavefront sensor. 
Each MOAO unit’s correction is an extrapolation via the 3D tomographic recon- 
struction of the turbulence, tuned for the particular field position of each MOAO 
unit. Since there is no closed-loop feedback of the corrected wavefront information, 
the deformable mirrors have a requirement for very high “go-to” accuracy — that 
is, they must respond in a predictable and known manner to the voltage commands 
given to them. In addition, the tomographic wavefront sensing system must accu- 
rately measure the wavefront to the dynamic range of overall turbulence, as opposed 
to being simply null-seeking as in closed-loop control. 


2. MOAO Controller Theory 


A wide field AO system must sense and correct for the entire volume of turbulent 
air through which science light may propagate. If we consider the column defined 
by a projection of the telescope aperture out to space, then the volume of interest 
is this column swept over the entire science field of view, as shown schematically 
in Fig. 2. Within this volume we desire to know the index of refraction variation 
from nominal, n(x, y,z) where n = 1 is the nominal. Here, « and y are defined 
as the axes parallel to the telescope entrance aperture, and z is the orthogonal, 
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Fig. 1. MCAO and MOAO architectures compared. (a) MCAO has deformable mirrors in series 
and passes all the beams from the science field and the guidestars. (b) MOAO has one deformable 
mirror per science beam and one wavefront sensor per guide star, all operating in parallel. 


light-propagation, axis. Since the telescope is not necessarily pointed straight up, 
the z-axis is not necessarily vertical. 

To simplify the discussion, define the x = 0, y = 0 line to be the point at the 
center of the telescope aperture projected out on a straight line to the center of the 
field of view. If the field radius is O and the telescope aperture has diameter D, 
then the extent of the volume is \/a? + y? < Oz+ D/2 and 0 < z < L where L is 
the distance along the line of sight that reaches negligible atmosphere. 


2.1. Multi-guide star tomography and MOAO projection 


Mathematically, MOAO wavefront control can be broken into two distinct stages.*? 


The first stage is the estimate of the index of refraction variation within the volume 
(sampled at discrete points) given the wavefront sensors’ data. The second stage 
involves calculating what would be the wavefront as seen by each MOAO science 
path. This correction is then applied to the corresponding deformable mirror (DM). 

Let x represent the vector (rasterized and stacked into a vector) of all the index 
points in the volume. These points are typically sampled vertically at a number 
of altitude layers, perhaps corresponding to the strong turbulence altitudes at the 
observatory site, and horizontally on the order of twice every ro(z), where ro(z) is 
the Fried coherence length appropriate for the turbulent layer at altitude z. These 
are rules of thumb for roughly where the sampling ought to lie; however, for an actual 
system the chosen sampling would be driven by system error budget considerations, 
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Fig. 2. The tomography volume for MOAO is the telescope column swept over the field address- 
able by MOAO units. 


in a trade-off with the marginal cost of wavefront sensor samples, DM actuators, 
and computing. 

Let y be the vector of aggregate wavefront sensor measurements (again ras- 
terized and stacked into a vector) (see Fig. 2). Remember that there are several 
wavefront sensors needed in order to probe the volume along several different direc- 
tions. The process of determining the volume from a number of projections through 
it is a volume computed tomography problem, where 2D projection data are used 
to reconstruct the 3D volume. Each wavefront sensor looks at one guide star in the 
guide star constellation and provides optical-path difference (OPD) measurements 
integrated along the z dimension and sampled on the orthogonal «—y dimensions 
over an aperture sized region: 


L 
OPD(a, y; 4:) = n(a+ 62,2, y+ Oy,2, z)dz; 1=1,2,..., Nos. (1) 
0 


Here, 6; represents the field angle to the ith guide star, n is the index of refrac- 
tion at the given position and angle, and Nj; is the number of guide stars in the 
constellation. We stack the OPDs into the vector 


y = (OPD(x, y; 1), OPD(x, y; 62),..., OPD(x, y; ON,,))7- (2) 
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For now we assume that the wavefront sensors can directly measure these OPDs, or 
at least the OPD relative to a pupil average OPD. Although this neglects the details 
of the type of wavefront sensor, e.g., a Hartmann sensor measures local slopes rather 
than phase directly, it is not such a bad assumption since we can always bundle the 
physical sensor with its phase reconstruction algorithm and call the whole unit a 
sensor package whose output is OPD. This simplifies the discussion that follows, 
where we will focus on the tomography and MOAO aspects of the system. 

Note that the wavefront sensor measurements are linear in the refractive index 
variations, so the measurement process can be written as a matrix equation relating 
unknown index samples to measured wavefront samples 


y =Ax+e. (3) 


Here, e represents the statistical and sampling errors of the wavefront sensor mea- 
surements. The matrix A has dimensions (number of wavefront sensor subapertures) 
x (number of volume elements) and the matrix elements A,; define the paths of 
guide star rays through the volume to particular points in the pupils of each wave- 
front sensor. The elements, conceptually, are just zeros and ones, i.e., Aj; = 1 if 
a guide star ray traverses volume element 7 on its way to its assigned wavefront 
sensor sub-aperture j and A;,; = 0 otherwise. With discrete sampling there may 
be interpolation and scaling involved, but the values are always greater than zero. 
The matrix is sparse, meaning it is mostly zeros. A sum over rows will determine if 
there are any volume elements that are left not traversed by rays, since obviously we 
want to set up the guide star geometry so that the appropriate atmosphere volume 
is probed. We call A a forward-projection operator. 
The simple least-squares solution for x given y is 


& = AT(AA?T)“1y. (4) 


The matrix A’ functions to send rays starting from a point on the wavefront sensor 
pupil back through the volume, leaving a trail as it goes. So, it can be considered 
a back-projection operator (mapping wavefront sensor measurements back to the 
volume). The factor (AA?’)~! is a pre-compensation matrix, which accounts and 
compensates for the fact that the overlapping guide star beams will provide redun- 
dant data in portions of the volume. 

This first step produces an estimate of the volume of turbulence x given guide 
star measurements y, which is common to both MOAO and MCAO problems. For 
MOAO the volume estimate is used to create the DM control signals that correct 
for integrated OPD along a specific direction, 6;, on the sky, where k runs from 1 to 
Nam, the number of MOAO channels. The solution is simply a projection through 
the volume estimate along the science target direction vector 


L 
op (@, Y) =| A(e+On,,ytOy,, z)dz; k=1,2,...,Nam. (5) 
0 
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In linear algebra terms (assuming a discretization of the volume into samples) 
ay = Ag,X. (6) 


Here, Ag, is a forward projection through the volume in the direction of the science 
target assigned to MOAO unit k, and ax is the resulting command to be applied to 
deformable mirror k. 

In the following sections, we discuss how to incorporate the Kolmogorov atmo- 
spheric spectrum and the C? profile of turbulence as a priori information for the 
tomographic estimate, and also how to account for the measurement noise in a 
statistically optimal manner; then we cover the practical issue of how to calibrate 
the MOAO system to accuracies needed for open-loop operation. 


2.2. Accounting for the seeing statistics 


Many large telescope observatory sites have now deployed on-site seeing monitors 
that provide information about the atmospheric seeing conditions in real-time for 
the instruments. The AO system can take advantage of measurements of ro and 
C? to adjust a priori statistical parameters in the wavefront reconstructor and 
thereby create better correction. The wavefront sensor measurement noise can also 
be accounted for in the reconstructor. The second order statistics of the refractive 
index variations are characterized by the structure function!® 


Da(r,2) = ([n(x,y,z) — n(a’,y',2))") = h-76.88(r/ro(2))*, (7) 


where r = \/(a — 2’)? + (y—y’)? and ro(z) is the Fried seeing parameter for the 
altitude layer at distance z. The turbulence at different z are assumed to be statis- 
tically independent since the distance between turbulent layers is very large relative 
to aperture+field scales. Given the C?(z) profile, the ro for a layer is 


—5/3 


ro(z) = [0.423k?C2(z)Az] (8) 


Note that the k = 27/X factor cancels out; it is introduced here because ro was 
historically defined as a wavelength-dependent parameter. Refractive index does 
not have appreciable wavelength dependence since the atmosphere is essentially 
non-dispersive over the range of wavelengths used in astronomical AO systems. 

To continue we will need the covariance matrix S = Cx"), The covariance is 
related to the structure function by 


Dn(r, Zz) = 2S;,(0, z) — 2S; (7, z) (9) 


which we solve for S,,(r, z) given D,(r,z) and S,,(0, z). Dr(r, z) is given by Eq. (7). 
S,,(0,z) is the variance of the index variations at height z. This will depend on the 
extent of the area under consideration. 

The index variation within a layer is defined with respect to the average of the 


finite part of this layer contained within the volume. We can safely ignore the average 
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“piston term” because an average phase shift in a imaging system has no affect on 
the image. Piston-removed variance on a circular region of size D,, = 20z+ D is 
approximated by!! 


Sn (0, Z) = 1.03(Dm/ro(z))*/3. (10) 


Exact values for the entries in the covariance matrix will depend on details of the 
particular site seeing conditions, which can be measured by site seeing monitors, 
or possibly online using the method described in Section 2.3. For large aperture 
telescopes and large MOAO fields, the outer scale of turbulence is likely to be 
significant. An introduction to site modeling can be found in Chapter 3 of Ref. 10. 

With the refractive-index covariance defined for points over the entire volume 
(letting the correlation between points in different layers be zero) we form the 
a priori covariance matrix S = (ax. We also let the covariance matrix for the 
wavefront measurement noise be N. Folding these second-order statistics into the 
estimator, we have the minimum variance estimator. Combined with the forward 
projection to the science detector, the index of refraction vector x and mirror defor- 
mation vector a are given by 


&=SA’(N+ASA")"ly, 
(11) 


a= Agx. 
2.3. Calibration: The Learn and Apply methodology 


Reference 12 developed the Learn and Apply algorithm to address the problem of 
calibration of the MOAO system and also to have the AO system itself do some 
of the atmosphere statistics characterization. They recognized that all forms of the 
MOAO reconstructor, including Eq. (4), Eq. (6) and Eq. (11) above, factor as 


a= CayCyyy; (12) 


where Cyy = (yy?) and Cay = (ay”) are the auto-correlation of the data and the 
cross-correlation of the actuator commands (that would reproduce the data y) to 
the data. From Eq. (3), we can see that Cy, = ASA? +N and, assuming a = Axx, 
the cross-correlation is Cay = A;SA7, making Eq. (12) equivalent to Eq. (11). If 
we assume no prior knowledge about the turbulence statistics and have high signal- 
to-noise measurements (S = I; N = 0), then we would have C,, = AA? and 
Cay = A;A”, reproducing the least-squares formula given by Eq. (4) and Eq. (6). 

The advantage of the Learn and Apply methodology is that the controller 
matrices can be calibrated using data collected on-sky. Auto-correlating wavefront 
sensor measurements using data from the wavefront sensors is straightforward: 
Cyy = (yy?) ,, Constructing Cay = (ay”) , from on-sky data requires extra wave- 
front sensors, one in each of the MOAO science arms, which are used only during 
the calibration step to collect the a correlation data. The data are collected while 
running each science arm in single-conjugate closed loop on a bright star. So, at the 
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cost of the additional wavefront sensors, the whole process of generating a controller 
is made empirical, making it robust to minor system misalignments, placements of 
the wavefront sensors, wavefront sensor responsivity, DM nonlinearities and so on. 
To speed up the process of generating the matrices, the method employs a paramet- 
ric model rather than requiring a statistical identification of every matrix entry. The 
system fits a small number of parameters, including ro, C2(z), noise, and alignment 
and linearity parameters. 

The Learn and Apply technique was demonstrated successfully on sky in 2011 
with the CANARY experiment on the William Herschel Telescope.? The later 
RAVEN experiment® also tested Learn and Apply on sky. 


2.4. Real-time controller 


Although presented in terms of compact matrix equations, the MOAO reconstructor 
presents a considerable challenge for computing in real time. The sample rate is on 
the order of 1 kHz and the matrices involved are huge. For example, for a 30-m class 
telescope the number of wavefront sensor data points is on the order of 100,000 (six 
guide stars with 20 cm sub-apertures) and the number of volume elements ( “voxels” ) 
to estimate is on the order of 300,000. Sparse matrix and Fourier techniques can 
help speed the operations, and recent advances in graphics processor unit (GPU) 
technology show promise for direct matrix multiply. 


3. Deformable Mirror Requirements 


MOAO systems are unique in that they require that the wavefront correction on 
science paths perform in open loop. This requires a very predictable and repeat- 
able “go-to” DM. The earlier technology DMs based on piezo-electric transducers 
suffered from considerable hysteresis and had temperature-sensitive response char- 
acteristics, although the latest developments with a stacked-actuator approach show 
promise of attaining 5-10nm repeatability. The more recent technology of micro- 
electromechancial systems (MEMS) deformable mirrors use electrostatic attraction, 
which is very repeatable and temperature insensitive, so long as the reference volt- 
ages in the power supply are kept accurate and stable. A MEMS device was used 
in the VILLAGES demonstrator experiment and achieved open-loop control perfor- 
mance on par with closed-loop.” 

All deformable mirrors have actuator-to-actuator cross-talk, where a given actu- 
ator’s response function depends on the setting of its neighbors. In closed-loop this 
is not a major issue since the cross-coupling is smoothed out in the feedback loop. 
It must be very accurately modeled in open loop, however. Reference 13 developed 
a technique that is suitable for MEMS devices. The actuators in a MEMS device 
are nonlinear but decoupled, while the shape of the deformable mirror surface is 
cross-coupled, but in a linear manner through the thin-plate equation. After proper 
calibration in the lab, the DM controller has a matrix that converts desired mirror 


Multi-Object Adaptive Optics 319 


shape to mirror forces, then uses a look-up table to determine the actuator voltages 
that produce those forces. The process was shown to be accurate to 30nm rms. The 
repeatability of the MEMS device (to produce the same wavefront from the same 
voltage settings) was shown to be better than 1nm. Both of these terms enter into 
the MOAO error budget. 


4. Wavefront Sensor Requirements 


The wavefront sensors must also be calibrated to high accuracy over the dynamic 
range of atmospheric turbulence, since they sense the guide star wavefronts open- 
loop. In closed-loop, one can take advantage of “null-seeking” to require only that 
the wavefront sensor provide the correct sign of correction and a small range of 
linear operation around zero. Examples of null-seeking sensors include a quad cell 
of pixels assigned to each dot in a Hartmann sensor, or the unmodulated pyramid 
wavefront sensor. For large dynamic range use, these types of sensors require a com- 
plex non-linear calibration (taking into account the static zero point wavefront as 
well), and must not exhibit saturation, so that they can operate with high accuracy 
over the full range of atmospheric aberration. For example, in closed-loop operation 
a null-seeking wavefront sensor need only operate linearly over approximately the 
close-loop wavefront error range, about 100—200nm. The full atmospheric aberra- 
tions can be on the order of 2 microns, a factor of 10 higher. A reasonably linear 
Hartmann sensor design with high dynamic range is one with a 4 x 4 array of 
pixels with about 2 arcsec-per-pixel plate scale, operating with a center-of-gravity 
algorithm. This allows about 1.5arcsec of spot motion (typical of open seeing) of 
a 2arcsec full-width-half-max laser guide star spot with only a small amount of 
discretization ripple. The actual choice of wavefront sensor sampling and algorithm 
should be chosen with the aid of analysis and simulation, given the site’s seeing 
characteristics. 


5. Packaging the MOAO Units 


The packaging of MOAO units presents a formidable challenge, given that sev- 
eral science targets must be selected simultaneously from an area at the telescope 
focal plane, and the beams steered into their respective spectrographs. To give an 
example, the field of interest might be on the order of 10 arcminutes diameter (a 
significant fraction of a large telescope’s field of view) at an f/15 focus (typical 
for Cassegrain and Nasmyth foci). The plate scale for a D = 10m telescope is 
0.727 mm /arcsec and the entire field covers a disk approximately 44cm in diameter 
(these numbers are 3x larger for a D = 30m telescope). Figure 3 shows how 20 
pickoff arms can be packaged around such a tight region. Since the field is arbitrary 
and galaxies or other science targets will be situated at random locations in this 
field, the MOAO units must have articulating pickoff arms that can move about 
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Fig. 3. Pickoff arms for the 20-object IRMOS MOAO concept for TMT* (used with permission 
of the authors). 


Fig. 4. Articulated pickoff arm concept for IRMOS* (used with permission of the authors). 


this area without colliding or, in the final position, occluding parts of the field of 
interest for the other units. 

Each pickoff requires a field that includes at least the seeing disk in order to 
collect most of the object’s light, even though downstream the AO corrected beam 
will be much smaller than this. Thus the arms cannot be positioned any closer 
to each other than the seeing disk diameter. The practical means of articulation 
for systems proposed so far fall into two categories: a two-axis articulating arm or 
a movable button carrying a fiber. There are several possible geometries for the 
articulating arm to achieve two axes of motion, but the ground rule is that, since 
the optical path needs to traverse along the arm sections, the total optical path from 
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Fig. 5. Optical design for an MOAO channel mounted within an articulating pickoff arm, showing 
pupil relay to the deformable mirror (DM) followed by focusing optics, all packages along the arm 
(taken from the IRMOS design)* (used with permission of the authors). The element labeled ADC 
is an atmospheric dispersion corrector. 


the end of the pickoff to the entrance of the spectrograph must remain unchanged 
as the arm is articulated to different focal plane positions, so that proper in-focus 
images are made. The button concept would involve placing the buttons at assigned 
positions and possibly holding them in place with magnets. Each button’s fiber then 
provides the light transport to the spectrograph. Figures 4 and 5 show an example 
pickoff arm design. 


6. System Concepts 


6.1. FALCON 


Fiber-spectrograph with adaptive-optics on large-fields to correct at optical and 
near-infrared (FALCON) is a MOAO-fed spectrograph concept explored as a second 
generation instrument for the VLT.! It proposed to use independent AO units, each 
sensing their own natural guide star, as dim as V = 16, to provide low to moderate 
AO correction. The target science was Z = 1 — 3 galaxies at a spectral resolution 
of 2000-6000 for studies of early star and galaxy formation. 


6.2. CANARY 


CANARY is a pathfinder MOAO system mounted at the 4.2 meter William Herschel 
Telescope that employs tomography on the wide field (~50 arcsec) using natural 
guide star asterisms and a single MOAO unit. A successful on-sky demonstration 
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occurred in 2011.2 CANARY makes use of three natural guide stars in an asterism; 
four such natural asterisms were selected for the on-sky experiments. The exper- 
iments compared MOAO to ground-layer adaptive optics (GLAO)* and conven- 
tional closed-loop adaptive optics with a single guide star, single conjugate adaptive 
optics or (SCAO),” Infrared image measurements over several nights confirmed that 
MOAO generally produced higher Strehl ratios than GLAO. SCAO produced the 
highest Strehl ratio in these experiments, as expected. The CANARY system also 
demonstrated open-loop control using a piezo-stack deformable mirror and the use 
of the Learn and Apply algorithm. 

CANARY functions as a pathfinder instrument for the proposed EAGLE instru- 
ment for the E-ELT. 


6.3. EAGLE 


EAGLE is a multi-object integral field spectrograph concept considered for use 
on the future 42-m E-ELT. The scientific applications overlap the earlier ideas for 
FALCON: characterizing early star formation and galaxy evolution. EAGLE will 
bring MOAO to a grand scale, with up to 20 MOAO units feeding 20 integral 
field spectrographs (IFUs)° in the near-infrared bands 0.8-2.45 um on a patrol field 
of 38.5 arcmin? (a circle of 7 arcmin diameter). The tomography wavefront sensor 
system will consist of six laser guide stars, combined with five natural guide stars for 
detecting the low order blind modes that the laser guide stars cannot measure. The 
system was originally conceived to be located at the E-ELT Gravity Invariant Focal 
Station,!+ but since then has it has been placed on one of the Nasmyth platforms 
on a rotation bearing.!° 


6.4. RAVEN 


RAVEN is a 2-channel MOAO demonstrator built by the University of Victoria 
and installed on the Subaru 8-m telescope. This is a pathfinder for a future TMT 
or Subaru MOAO multi-object spectrograph, and can also do some limited science 
observations using the 8-m’s Infrared Camera and Spectrograph (IRCS). RAVEN 
uses the existing Subaru laser guide star plus up to three natural stars as the tomo- 
graphic constellation. The experiments compared GLAO, MOAO, and conventional 
closed-loop AO wavefront controllers. For MOAO both model-based (Eq. (11)) 
and Learn and Apply (Eq. (12)) control methods were tested on-sky, with Learn 
and Apply showing slightly better performance. The simulations and in-lab pre- 
dictions of enclosed energy matched well with on-sky data, indicating that with 
careful alignment and calibration the performance of a model-based controller can 


*See Chapter 16. 
>See Chapters 13 and 14. 
“See Chapter 14 of Volume 3. 
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approach that of Learn and Apply, and that the empirical on-sky calibration pro- 
vides added value. 

The two science arms each use a membrane deformable mirror with an 11 x 11 
array of magnetically deflected actuators. Since this type of deformable mirror does 
not have the “go-to” repeatability that is capable of producing a given mirror shape 
on demand as needed for open-loop operation, the system uses internal light sources 
and assigned wavefront sensors in each science channel to probe and control the 
mirror shapes during operation. This way, the DM, light source, and wavefront 
sensor operate together effectively as a go-to DM system. The extra wavefront 
sensors double as the calibration sensors needed for the Learn and Apply algorithm, 
so some of the extra complexity has a second use. 


6.5. IRMOS 


IRMOS? is a multi-object integral field spectrograph concept for a second-light 
instrument on the TMT. IRMOS is configured for up to 20 selectable 2 x 2 arcsec 
fields addressable on a 5 arcmin field of regard, to be fed to infrared integral field 
spectrograph units sensitive in the 0.8-2.5 4m bands. The system is designed to 
cover a number of science use cases, with key drivers being properties of very high- 
redshift (Z > 5) galaxies and properties of galaxies during peak star formation. 
The concept study considered guide star constellations of between six and eight 
laser guide stars and the use of MEMS deformable mirrors in each of the science 
arms. 


6.6. KNGAO 


KNGAO? is a multi-laser tomography AO system proposed for the 10-m Keck II 
telescope. The system is designed to provide enhanced Strehl ratios over a 40 arcsec 
field using four LGS, with the tomography functioning to remove the single-LGS 
cone anisoplanatism, a dominant error budget term in the LGS AO system now 
on the telescope. Science drivers include galaxy assembly, star formation history, 
black holes in active galactic nuclei, and precision astrometry for stars near the 
black hole at the center of the Milky Way Galaxy. The system uses an MOAO 
approach for the one science channel at the center, and three roving MOAO units 
over a 2-arcmin field for natural tip/tilt stars, with the aim of sharpening the 
tip/tilt stars so that dimmer ones can be used, thus enhancing sky coverage. The 
original KNGAO concept has the tip/tilt units each assigned a “roving” laser guide 
star (three additional laser guide stars) as a wavefront reference, making them not 
dependent on a global tomography reconstructor, but still are operating open loop. 
In the event that the extra laser guide stars add too much complexity and cost, the 
tip/tilt units will use the tomographic wavefront projection for control. Each of the 
channels, central science and three tip/tilt units, use MEMS deformable mirrors. 
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7. Conclusions 


We have provided an introduction to MOAO systems, an architecture for laser- 
tomography-based AO that is advantageous for high resolution multi-object spec- 
trographs and for correcting tip/tilt stars in any multi-laser guide star system. The 
multi-object architecture multiplexes the use of telescope time for parallel data 
collection. The MOAO control algorithm has similarity to MCAO in that multi- 
guide star tomography forms the first stage, but differs in how the DM controls 
are calculated. MOAO has the added challenge of open-loop operation, putting 
special demands on the DMs and wavefront sensors to operate linearly over the 
entire range of atmospheric turbulence. MOAO is an advanced concept that has 
not yet been deployed on a workhorse science instrument package; however, there 
have been a number of successful demonstrations of pathfinder instruments that 
show the feasibility of the technologies involved and achieve promised performance 
on-sky. Concepts for MOAO systems on both existing large and future “extremely 
large” telescopes are now under serious consideration by the major observatories as 
next-generation instruments. 


References 


1. F. Hammer, F. Sayéde, E. Gendron et al., The FALCON concept: Multi-object spec- 
troscopy combined with MCAO in near-IR, in Scientific Drivers for ESO Future 
VLT/VLTI Instrumentation: Proceedings of the ESO Workshop Held in Garching, 
Germany, 11-15 June 2001 (2002), pp. 139-148. 

2. E. Gendron, F. Vidal, M. Brangier et al., Astron. Astrophys. 529, 2-5 (2011). 

3. D. Gavel, B. Bauman, R. Dekany, M. Britton, and D. Andersen, Proc. SPIE 6272, 

62720R, (2006). 

S. Eikenberry, D. Andersen, R. Guzman et al., Proc. SPIE 6269, 62695W (2006). 

P. Wizinowich, S. Adkins, R. Dekany et al., Proc. SPIE 7736, 77360K (2010). 

O. Lardiére, D. Andersen, C. Blain et al., Proc. SPIE 9148, 91481G (2014). 

D. Gavel, S. Severson, B. Bauman et al., Proc. SPIE 6888, 688804 (2008). 

D. Gavel, Proc. SPIE 5490, 1356-1373 (2004). 

D. Gavel, Stability of closed-loop tomography algorithms for adaptive optics, in OS'A 

Topical Meeting on Adaptive Optics: Analysis and Methods/Computational Optical 

Sensing and Imaging/Information Photonics/Signal Recovery and Synthesis (2005), 

p. AThA5. 

10. J. W. Hardy, Adaptive Optics for Astronomical Telescopes (Oxford University Press, 
New York, 1998). 

11. R. J. Noll, J. Opt. Soc. Am. 66(3), 207-211 (1976). 

12. F. Vidal, E. Gendron, and G. Rousset, J. Opt. Soc. Am. A 27, A253-A264 (2010). 

13. K. M. Morzinski, K. B. W. Harpsde, D. T. Gavel, and S. M. Ammons, Proc. 
SPIE 6467, 64670G (2007). 

14. S. Morris and J.-G. Cuby, The Messenger 140, 22-23 (2010). 

15. S. Morris, J.-G. Cuby, M. Dubbeldam et al., Proc. SPIE 8446, 84461J (2002). 


SOs 


Chapter 18 


Extreme Adaptive Optics 


Markus Kasper 


European Southern Observatory, Adaptive Optics Systems, 
Karl-Schwarzschild-Str.2, 85748 Garching, Germany 


mkasper @eso.org 


Extreme adaptive optics (XAO) is scientifically driven by the need to achieve 
high-contrast imaging and spectroscopic capabilities to enhance the detection 
and characterization of extra-solar planetary systems. The following sections will 
introduce the science case and corresponding requirements for the XAO system. 
There are two main objectives, (i) a Strehl ratio near one to maximize the exo- 
planet’s signal, and (ii) maximum suppression of the stellar residual flux at the 
angular separation of interest to minimize detection noise. Both objectives are 
simultaneously achieved by a superb correction of residual wavefront errors. This 
requires a good knowledge of the wavefront error budget, which will be discussed 
in great detail. The article will conclude with a case study for the detection of 
the potentially habitable exoplanet Proxima b with current 8-m class telescopes. 


1. Science Requirements 


One of the most rapidly developing fields of modern astrophysics is the study of 
extrasolar planets and planetary systems. The key goals of the field include under- 
standing the architectures of exoplanetary systems; the formation and evolution 
of planetary systems; and the composition and structure of exoplanetary atmo- 
spheres. With over 3000 planets identified (mostly through indirect methods by 
NASA’s Kepler mission) we have developed a robust statistical understanding of 
the inner planetary systems, i.e., planets with periods smaller than a few years and 
orbital separations smaller than a few AU, and have thereby made considerable 
progress toward the first goal. However, the architectures of the outer planetary 
systems remain essentially unexplored. Given the lack of good understanding of the 
planetary architectures yet, progress toward understanding the formation of plan- 
etary systems has been limited. Furthermore, over 99% of the planets discovered 
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have been found indirectly, leaving us with limited data to study and understand 
the properties of the exoplanet atmospheres. 

So far, all directly imaged giant exoplanets, HR8799bcde,!*? BPic,? HD95086b,* 
HD106906b,° 51 Eri b® and HIP65426b,” orbit A- or F-type stars. These stars have 
about one and a half to two times the mass of the Sun, and some are embedded 
in rather massive debris disks. Interestingly, radial velocity data also indicate a 
maximum for the giant planet occurrence rate of between ~10-30% for stars in this 
mass regime.® The giant planet occurrence rate rises steeply with stellar metallicity 
and drops rapidly for even higher stellar masses above two solar masses. Data from 
the Kepler satellite show that the situation is quite different for low-mass, sub- 
Neptunian planets, which are more abundant around low-mass M-type stars? and 
much more abundant than giant planets in general.!° 

So why is the number of directly imaged planets still rather low? The main 
reason is that such observations are extremely challenging, because the intensity 
contrast between the stellar light reflected off a planetary surface and the star 
itself is less than one part in a million at a few tens of milliarcseconds (mas) of 
angular separation and gets even smaller for larger distances. Figures 1 and 2 show 
the approximate NIR contrast and apparent magnitude, respectively, estimated for 
nearby exoplanets, most of which were discovered by the RV technique. Most of 
these planets are ice- or gas giant planets with masses and sizes significantly higher 
than those of Earth. 


10° 
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Fig. 1. Approximate NIR intensity contrast to the host star for nearby exoplanets detected in 
radial velocity surveys. The marker size is proportional to the logarithm of planet mass. 
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Fig. 2. Approximate J-band apparent magnitude histogram for nearby exoplanets detected in 
radial velocity surveys. 


Potentially habitable planets with sizes, masses and temperatures similar to 
those of Earth are even harder to observe. Figure 3 shows the approximate contrast 
estimated for a simulated population around the nearest stars. Again, the most 
favorable contrasts are observed at the smallest angular separations. 

Such high-contrast observations are not yet possible with current telescopes and 
astronomical instrumentation. However, planet evolutionary models predict that 
giant planets glow in the near-infrared when they are young (age up to a few hundred 
Myr) and still warm from converting gravitational energy into heat.1) 13 They can 
thus be observed at more moderate contrast levels (around 10~°), independent 
of their distance to the parent star, and all the directly imaged planets mentioned 
above fall into this category. Figure 4 shows the brightness of young giant planets as 
a function of mass and age. For example, giant planets around the A6-star (Pictoris, 
with an age of around 20 Myr and an absolute NIR magnitude of about 2, would 
appear at J-band contrast ratios of about 10~* (10 magnitudes) for a ten My 
planet and about 1.6 x 1077 (17 magnitudes) for a one M7 planet. Contrast ratios 
at longer L-band wavelengths would be about two magnitudes more favorable than 
in J-band for such planets. 

On the other hand, planet formation models predict that most planets form 
within roughly 10 AU from the star. According to the core accretion model,!* the 
first step is to build up a core from solid material in the disk. Once this core reaches 
a mass of roughly 10 Earth masses, it can very efficiently accrete gas and quickly 
gain in mass to become a giant planet. The main challenge is to build up the core 
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Fig. 3. I-band flux ratio between Earth analogue planets and parent stars within 10 pc (from the 
Hipparcos catalogue). The size of the symbols indicates the planet’s apparent brightness, and the 
colors approximately match the color of the stars. 
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Fig. 4. Absolute J-band (a) and L-band (b) flux of exoplanets with masses between one and ten 
M7, derived from the models of Ref. 12. 


in time, before most of the gas has either been accreted onto the star or pushed 
out of the system by radiation pressure. Observations indicate that this happens 
in less than 10 million years, such that the giant planet cores must emerge after a 
few million years at most. The preferred location to start this process is beyond the 
ice-line, where temperatures are low enough (below about 200K) for water vapor 
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to freeze and dramatically increase the available mass of solids. The ice-line for 
solar-type stars is located at around one to a few AU. 

The young stars closest to the Sun are members of young moving groups such 
as the (Pictoris moving group at typically a few tens of parsec distance. Hence, 
the projected separation of the ice-line for these stars is within about one tenth 
of an arcsecond, which can hardly be resolved with current very large telescopes. 
There are, however some processes that alter the orbits of planets during or shortly 
after formation. The most prominent one is migration in the protoplanetary disk,!° 
where gravitational interaction between star and disk transfer angular momentum 
between the two. Unfortunately, these processes almost always lead to a loss of 
angular momentum of the planet, moving it closer to the star. While this is a nice 
explanation for the initially unexpected existence of hot Jupiters,!® it does not 
provide us with a population of giant planets already observable by direct imaging. 

The most promising models to create a giant planet population at large sep- 
arations are planet—planet scattering!” or in situ formation by spontaneous gravi- 
tational collapse of some part of the disk.!® Figure 5 shows that planet formation 
models, which include planet—planet interaction, produce only a small population 
of planets at large separations, some of which are high-mass. 

In agreement with these models, current direct imaging surveys sensitive to 
young giant planets at large separations already indicate a low occurrence rate of 
a few percent at most. The International Deep Planet Survey concluded that only 
about 1-2% of the stars harbor at least one giant planet with a mass between 0.5M 
and 14M, at a separation between 20 AU and 300 AU.!9 

In order to directly image and characterize a larger number of exoplanets, 
telescope and instrument have to provide a high enough contrast and sufficient 
sensitivity as outlined in Figs. 1 and 2. In a nutshell, contrast levels around 107° 
at 100mas and 107° at 20 mas are needed for the observation of nearby planetary 
systems with a limiting J-band magnitude fainter than 26. This would enable not 
only the detection of giant planets in orbits of a few AU, but even the observation of 
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Fig. 5. Mass versus semi-major axis for a synthetic planet population. Credit: Ref. 17, reproduced 
with permission, ©ESO. 
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potentially habitable planets around very nearby M stars, as shown in Fig. 3. These 
low-mass stars are particularly interesting, because around 80% of all stars belong 
to this group, and a considerable number of them are within 5 pc from the Sun. It 
is therefore most likely that the first habitable exoplanet will be found around a 
nearby M star. 

High-contrast imaging at small angular separation is usually achieved through 
eXtreme Adaptive Optics (XAO, see http://cfao.ucolick.org/research/exao.php) 
long exposure, coronagraphic imaging, combined with very careful control and char- 
acterization of the residual speckle pattern. The main objective of the XAO system 
is the delivery of a very high Strehl ratio on-axis. Maximizing the energy in the core 
of the PSF also maximizes the signal of the exoplanet. At the same time, this min- 
imizes the stellar light scattered into the PSF wings, which is the dominant source 
of photon noise in the optical to near-infrared spectral domain. XAO is therefore 
the first and essential step for achieving high-contrast imaging performance with 
ground-based telescope instrumentation. 


2. Strehl Ratio and Contrast 


XAO is scientifically driven by the need to achieve high-contrast imaging and spec- 
troscopic capabilities to enhance the detection and characterization of extra-solar 
planetary systems, although many other science cases obviously benefit as well 
from a superbly corrected high quality PSF down to optical wavelengths. This 
performance results from a greatly reduced overall residual on-axis wavefront error 
(WFE) in an otherwise fairly standard single-conjugate AO (SCAQO) system. Pri- 
marily, this is achieved by increasing the sampling of the wavefront sensor (WF'S) 
and deformable mirror (DM), and by increasing the correction temporal bandwidth 
through higher framerates and smaller delays. Therefore, one could say that XAO 
is basically SCAO on steroids.?° 

The correction performance of an AO system is often quantified by the Strehl 
ratio, which is given by S = — where the observed peak intensity of a real point 
source image, including phase aberrations, is J(0), and the theoretical maximum 
peak intensity of the Airy pattern is [airy (0). 

For small wavefront errors oy, the Strehl ratio is given by S & (1 — a): This 
expression is often referred to as the Maréchal approximation, although Ref. 21 
presented it in a slightly different form. It holds for small aberrations and high 
Strehl ratios only, i.e., for og < 0.45 radian rms, corresponding to S = 80%. More 
commonly used is the expression S = exp(—0%), which has never been derived by 
Maréchal but is also often called Maréchal approximation or extended Maréchal 
approximation. It is more accurate for a wider range of phase errors, and even holds 
exactly in the common case of Gaussian phase residuals .?? The extended Maréchal 
approximation is also a very good approximation for many other distribution laws. 
So, the smaller the residual WFE, the higher the Strehl ratio, i-e., the more flux 
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from the residual halo is transferred into the Airy pattern and can be removed by 
a coronagraph.* The total residual halo intensity is then just 1 — S = 03, whose 
minimization is the prime objective of high-contrast imaging. 

The coronagraphic PSF contrast is another important performance metric for 
an XAO system. It is a measure for residual halo variations in a given region of the 
image, normalized to I(0). One often assumes that the residual halo is uniformly 
distributed over all azimuth angles, and calculates spatial statistics over annuli 
with a fixed width, varying only the radial distance. This results in the common 
plots showing contrast as a function of angular separation. While the Strehl ratio 
is a measure for the total residual halo intensity and is therefore a single number, 
the contrast curve provides the local background noise at the location of the faint 
companions around bright stars. 

In the following, we will investigate how the contrast can be derived from resid- 
ual wavefront errors. Just like any integrable function, a light-wave across the aper- 
ture can be be represented by a sum of sine and cosine waves. So without loss 
of generality, we will derive the residual image intensity produced by an aperture 
light-wave 


Eq(x) = A(a) exp(ig(@)), (1) 


which consists of small amplitude and phase sinusoidal aberrations, A(a) = 1 + 
asin(27af + a) and ¢(a) = bsin(27af + 8), with the spatial frequency f, trans- 
mission amplitude 0 < a < 1, and the phase amplitude b = 27h/X. Here h is the 
wavefront error in [m], and \ is the observing wavelength. 

Absorbing the amplitude into the exponential and assuming small aberration 
amplitudes, we can use the small aberration approximation 


exp(id) & 1+ id. (2) 


If we assume typical XAO residual phase error amplitudes of just a few nanome- 
ters per sinusoidal mode, the corresponding phase errors are of the order of a few 
times 10~? rad at imaging wavelengths in the red optical and near-infrared. For 
such phase residuals, the small aberration approximation leads to errors that are 
below 10~% rad. Considering that contrast intensity is proportional to phase error 
squared, the resulting errors are smaller than 1% and therefore insignificant. 

With Eq. (2) and a,b < 1, we can now write Eq. (1) as 


E,(z) =1+ asin(2raf + a) + ibsin(2raf + p). (3) 


We use the far-field or Fraunhofer approximation, which is essentially the Fourier 
transform of E(x) calculated over the telescope aperture, to propagate this wave- 
front from the aperture to the image plane. For the central plane wave, i.e., the first 


*See Part 4 of Volume 3 of this Handbook. 
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term on the right-hand side of Eq. (3), this yields the well-known Airy pattern of 
the aperture 


E;0(6) = i | i empiaehe uh )idedy 
aper 


- // 1 x exp (iFtuo, + wy)) dxdy, (4) 
aper 


with 6 = Af denoting the angular coordinate on the sky. 
Knowing that the Fourier transform of the sine term in Eq. (3) is a pair of 
impulses at frequencies —f and f, the light-wave in the image plane is 


E;(0) = Ej,0(0) + 0.5[a exp(—i(a/2 — a)) + bexp(i8)| Ei,0(0 — fr) 
+0.5[a exp(i(/2 — a)) + bexp(i(a — B))|Eio(0 + fA). (5) 


There is a lot to learn from the careful inspection of this result. First, a light- 
wave showing a phase and amplitude sinusoidal ripple is represented by the coherent 
sum of three plane light-waves; the central wave, and two off-axis waves, scaled 
by the amplitude of the ripples, and coming in at positive and negative angles 
proportional to the spatial frequency of the ripple. The phases of the two off-axis 
waves depend on the phases of the sinusoids, and on whether they are produced by 
phase or amplitude ripples. 

The observed image intensity is the absolute square of £;(0). Hence the observed 
image intensity will consist of three point source images, one in the center and one 
on either side of it at a separation given by the spatial frequency of the ripple, and 
cross-terms between the three plane waves. Given that aberration amplitudes are 
small, the main terms in the absolute square of Eq. (5) that determine the image 
intensity are: 


e The central image resembling the Airy pattern. 

e The two speckles produced by the amplitude and the phase sinusoids. The inten- 
sity of a pure phase speckle relative to the central image is b?/4 = (h/A)?, ie., 
a sine ripple with an amplitude of 1nm rms produces a pair of speckles with a 
brightness of 5 x 10~° at \ = 600nm. The phases of these speckles on either side 
of the central image are 7/2 — a@ and —(m/2— a) for the amplitude sinusoid, and 
6 and a — £ for the phase sinusoids. This has the important consequence that 
phase and amplitude speckles that interfere destructively on one side of the PSF 
(e.g., 7 — 8 = 7/2+ a) interfere constructively on the other side (6 = 7/2 — a). 
A compensation of amplitude speckles with phase is only possibly over half of the 
field of view. 

e The pinning term, i.e., the interference of the central image with the speckles. 


Again, we see that a phase sinusoid with 8 = a would interfere destructively 
with the central wave (phase is zero) on one side of the image, while they would 
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Fig. 6. SPHERE IRDIS H-band on-sky coronagraphic image with a superposition of two diag- 
onally oriented sinusoidal ripples applied to the DM and orthogonal to each other. Each of the 
ripples produces two speckles symmetric to the image center. The speckles are elongated in the 
radial direction when observed through a broadband filter, because the position is a function of 
wavelength (9 = Af). The central corrected area of about 1.5’ diameter in H-band is also well 
visible. 


interfere constructively on the other side. So, the pinned speckle term is anti- 
symmetric. Amplitude sinusoids with a = 7/2, however, would have the same 
effect on both sides of the image. 


A deformable mirror has a finite number of actuators n, across the telescope 
aperture with diameter D. It can therefore only reproduce spatial frequencies up to 
the Nyquist limit of f = n1/(2D) and correct for aberrations up to the correction 
radius of 649 = Af when observing at wavelength A, as illustrated in Fig. 6. 

Speckle nulling methods identify speckles through focal plane wavefront 
sensing”? and introduce phase or amplitude aberrations, e.g., with a deformable 
mirror?*:?> or an auxiliary optical component like an apodized phase plate,?° to 
remove light from one or both sides of an image. 

The perfect coronagraph is a theoretical concept, which completely removes 
the central wave and therefore also the pinning terms.?” It leaves only the speckles 
produced by phase and amplitude aberrations. Real coronagraphs nowadays already 
provide a nearly ideal performance, as described in Part 4 of Volume 3 of this 
Handbook. Therefore, the perfect coronagraph is a valid and useful assumption 
for the analysis of XAO performance, because the power spectrum of AO residual 
aberrations then corresponds to the AO residual PSF halo. 


3. The XAO Error Budget 


Individual terms of the AO error budget have different characteristic spatial fre- 
quency distributions. In order to minimize halo intensity at the scientifically 
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interesting angular separations, a good understanding and control of these terms is 
essential. The following presentation is largely inspired by Refs. 28 and 29, which 
provide a thorough description and derivation of the error terms relevant for XAO. 


3.1. Propagation and Talbot effect 


Atmospheric turbulence is vertically distributed and described by the C2(z) profile. 
Phase aberrations are introduced into the light-wave accordingly, as it propagates 
through the atmosphere to the telescope aperture. The propagation distances are 
between zero (ground-layer turbulence) and about 10-20 km for the highest tur- 
bulent layers. Such distances fall into the Fresnel or near-field propagation regime, 
which is described essentially by the convolution 


E(x, z) = E(x, 0) ® exp(ia|x|?/zA). (6) 


Let E denote the Fourier transform of E. Considering the convolution theorem 
and the fact that the Gaussian function is its own Fourier transform, Eq. (6) is 
equivalent to multiplying each component with spatial frequency f by a quadratic 
phase factor dé = mAz|f|?: 


E(£, z) = E(£,0) exp(id¢). (7) 


We consider again a light-wave with small sinusoidal phase aberration 


E(x) =14+4 ibsin(27afo + 8) 


=e TE fexpli2ra-fo + 8) — exp(—i2nzfo + 8)]. (8) 


We see again that a light-wave with a small sinusoidal phase aberration of spatial 
frequency f is the same thing as a coherent sum of three plane light-waves. Propa- 
gating each of them using Eq. (7) and considering the Fourier transform of a plane 
light-wave, 


[etre ae = 6 - fo) (9) 
and realizing that exp(id@) rao = 1, the propagated light-wave is 


E(@,z)=1+ exp(idd) spi sin(272 fo + 8) 


=1+ sin(dg) = sin(27ax fo + 8) +i cos(d) = sin(27xfo+ 8). (10) 


Propagation along z is then given by a constant plane light-wave and real and 
imaginary sinusoidal terms. A small sinusoidal phase aberration propagates into a 


>See Chapter 13. 
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Propagation 


Pupil OPD aberrations 


Fig. 7. This illustration shows how the phase aberrations shown in the lower right panel are 
converted into intensity fluctuations as the beam propagates upwards, assuming a wavelength 
bandpass of 0.4—0.8 xm. The intensity distribution at some heights is shown in 2D in the middle- 
panels and continuously in the horizontal cut to the left. Adapted from Ref. 30. 


mixture of sinusoidal amplitude and phase aberrations of the same spatial frequency. 
The relative contributions of phase and amplitude oscillate with dé. For example, 
a phase sinusoidal is completely converted into an amplitude sinusoidal when dé = 
a/2 and back but with opposite phase when d¢ = 7. 

Propagation of a more complex phase aberration typical for a light-wave cor- 
rected by AO is shown in Fig. 7. High spatial frequency intensity fluctuations appear 
first because dd x f?. Typical phase errors with spatial frequencies of f ~ 2/m 
produce the largest intensity fluctuations at propagation distances of a few hundred 
kilometers. Very small spatial frequencies are only efficiently converted into ampli- 
tude errors at much larger propagation distances. The figure also nicely illustrates 
the development of scintillation for a plane light-wave that travels through the 
turbulent atmosphere. High-altitude turbulent layers introduce phase aberrations 
that are partially converted into amplitude aberrations while propagating to the 
ground. This causes a non-uniform amplitude distribution and a “twinkling” of 
stars observed through a small aperture such as the human eye. 

Equation (10) also shows that a light-wave of a certain spatial frequency is 
exactly reproduced when d@ = 27, i.e., after a propagation distance of 


2 


aoe (11) 
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This interesting consequence of near-field diffraction was discovered by Henry Fox 
Talbot in 1836 when he observed the reproduction of images of an optical grating 
with a given period d = 1/f at several distances separated by the Talbot length zr. 
The Talbot effect was later derived mathematically by Lord Rayleigh in 1881, and 
is comprehensively presented in the literature, e.g., Ref. 31. 


3.2. Derivation of error budget terms 


The phase power spectrum introduced by a Von Karman turbulent layer with outer 
scale Lo is 


Wolf) = es? + £9) 1%. (12) 


According to Eq. (10), the fraction of atmospheric turbulence that ends up as a 
phase variance in the telescope aperture after propagation through the atmosphere 
+ 28 
is 


X(f,rAi) = : 13 
and the fraction that is converted into amplitude variance is 
C2(z) sin? (wz f?A;)dz 
ves SE 24, (14) 


[edz 


with A; denoting the imaging wavelength. In the small aberration approximation 
(Eq. (2)), an aberration power spectrum is converted into a PSF contrast by mul- 
tiplication with the inverse of the telescope aperture area, i.e., with 4/(7D?) in 
the case of a circular aperture with diameter D.°? Also the phase average over the 
telescope aperture, the piston mode, is neither seen by the AO WFS nor does it have 
an effect on the image. Therefore, it should be filtered out as well by the application 
of a piston filter,?° which is F,(f) = 1—(2Ji(aDf)/(aDf))? for a circular aperture. 
In practice, the piston filter has little effect at angular separations larger than the 
telescope’s diffraction limit and can be neglected. 
The contrast produced by phase aberrations is then 


4 (Xr \? 
Col0) = Se (3B) XA, ANW/A), (15) 
and the contrast produced by amplitude aberrations, i.e., scintillation, is 
61(0) = 4, (22) veri A) Wo(6/r) (16) 
1 = 7D? ov 1) AG ’ a) 


These represent the cases of uncorrected (open-loop) phase and amplitude errors, 
and this conversion from the power spectrum to contrast is only valid for small 
aberrations. 
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The temporal delay error can be estimated using the Taylor, or frozen flow, 
hypothesis and windspeed vy. Then, a wavefront after a delay At is the same 
wavefront shifted in the direction of the wind by —v,,At, and the resulting power 


spectrum of the temporal delay error is?® 


C2(0) = (2rv,Atf)?Co (6). (17) 


The photon noise error can be calculated from the assumption of a white noise 
power spectrum at the input, whose level is proportional to the number of pho- 
tons available per sampling time filtered by the WFS reconstructor. In general, the 


photon noise error can be written as?® 


C(8) = (2) (4) (SL (18) 
a) \ Qn /Tiphot 
For a sampling time t,, guide star flux F, and circular telescope aperture area 
nD? /4, Npnot = tsF'D?/4. The WFS sensitivity B)(f) is a function of spatial 
frequency. It is different for different WFS and has been calculated for the most 
common ones.?° The non-modulated Pyramid WFS, for example, has a noise sen- 
sitivity of Bp, = /2 and is independent of f. 

A difference in observing wavelength between the WFS and the science camera 
results in chromatic errors. Firstly, Fresnel propagation (Eq. (6)) is chromatic, i-e., 
different fractions of phase error are converted into amplitude at different wave- 
lengths. This introduces the following error:?° 


dX (0/2, Ai; A) 


Ca(@) = Co(9) X(O/r Mi)” 


(19) 


with 


[ C2(z)[cos? (4z f?A;) — cos? (nz f?A)]dz_ 


dX (0/2, Ai, A) = f C2(2)dz 


(20) 


Secondly, the refractive index of air is chromatic. For dry air at pp = 1 atmo- 
sphere pressure and T, = 288.15K temperature, it is given by? 


29498.1 x 10~° 255.4 x 1076 
—1= 64.32 1078 4 er 21 
nio(A) 64.328 x 10°" + Tig Tftax toy? | M—-taxiose =) 
For other temperatures and pressures, it is 
Ti 
n—-1=(no—-1)22. (22) 
pol’ 


The refractive index chromaticity introduces the error** 


(Ai) 70) | a 


Cs(0, Xi, A) = Co(0) ( mA) — 1 
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Finally, light-rays of different wavelength suffer from differential atmospheric 
refraction. Therefore, they travel on slightly different paths through the atmosphere, 
and encounter a slightly different turbulence. This error is called chromatic aniso- 
planatism and is given by? 


C6 (0, i, 4) = 2Co (6) / C?(2)(1 — cos®(2nz0/Ayan)de. (24) 


The differential atmospheric refraction angle is given by®* 


a(di.d6) = (St OF) tanto), (25) 


with the zenith angle ¢. 
It is further assumed that the DM can perfectly fit phase aberrations up to its 
correction radius 6p, such that the fitting error consists of 


C7(0) = H(0 — 6pm)Co(9), (26) 


with H being the Heaviside step function, being equal to zero at 8 < @pm and equal 
to one otherwise. 

There are other AO error terms, which are usually considered but are of little 
relevance to XAO for high-contrast imaging. If needed, the aliasing error can be 
removed by a properly designed spatial filter,?° and readout noise is close to zero 
for state-of-the-art detectors. Also, angular or focal anisoplanatism are not present, 
because the AO guide star is the astronomical object itself. The analytical model, 
however, does not capture precisely wavefront reconstruction and dynamical com- 
pensation. The formulas above are quite simplistic and the analysis should always 
be verified by comprehensive numerical simulations. There are also additional error 
terms for XAO systems aiming at the correction of phase and amplitude with mul- 
tiple DMs, which are described in the literature.?® 

Here, we stick to a “classical” XAO system with one DM in the pupil plane for 
the correction of phase aberrations. For such a system, the achievable contrast is 
given by 


C=C,4+C.4+ C3 4+ C4,4+ C5 + Ce + Cr. (27) 


In the following, we apply the analytical model to study a typical science XAO 
science cases and derive 1st order contrast estimates for those. 


3.3. Case-study: Observing Proxima b with an 8-m telescope 


With a distance of just 1.3 pc (4.25 light years), Proxima Centauri, the late M-star 
companion to the Alpha Centauri binary, is our closest neighbor. It is also orbited 
by a planet in the habitable zone at 0.05 AU, which is slightly more massive than 
Earth assuming that the orbit is not close to face-on .°° This planet, Proximab, is an 
obvious candidate to look for biosignatures, i.e., the presence of spectral signatures 
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for which a biotic origin is the most likely explanation. The most widely acknowl- 
edged biosignature is molecular oxygen, with a prominent absorption band, the Og 
A-band, between wavelengths of 759nm and 771 nm. 

It has been proposed that an instrument combining XAO, high-contrast imag- 
ing, and very high resolution spectroscopy could detect oxygen in Proximab’s 
atmosphere using readily available 8-m class telescopes .2’ At the wavelength of 
the Og A-band, the angular separation between the star and the planet of up to 
37mas would correspond to 1.9\/D and it would therefore be spatially resolved. 
XAO high-contrast imaging would effectively filter out stellar photons, which are 
the dominant source of noise in such an experiment. Suppressing the light of the 
star by a factor of about 5000 would allow us to detect oxygen using about 60 nights 
of telescope time. We will now evaluate whether such a suppression factor is feasible 
and what kind of AO system would be needed to achieve it. 

We assume observing conditions representative for a median atmosphere on 
Cerro Paranal in Chile, home of ESO’s Very Large Telescope (VLT) observatory. 
The science imaging wavelength to observe the Og A-band is 760nm, and the 
wavefront sensor camera observes at a central wavelength of 900nm with 200nm 
bandwidth. This WFS waveband is optimum, because it is still covered by modern 
CCD detectors that have minimal readout noise, and it minimizes the chromatic 
differences to the science wavelength (Eqs. (19), (23) and (25)). It also minimizes 
the photon noise error (Eq. (18)) by representing the best compromise between a 
short WFS wavelength for a small diffraction limit and a long enough wavelength for 
the red M-type Proxima Cen to emit significant flux. With its apparent magnitude 
of J = 7.4 and an assumed overall transmission of 20% to the WFS, including 
detector quantum efficiency and taking into account the effective 50% reduction 
for L3-CCDs, we expect a flux of about 10° photoelectrons/s/m? on the WFS. The 
fixed parameters used for the following analysis are provided in Table 1. 


Table 1. XAO simulation parameters for 
the case of Proxima Cen. 


Parameter Value 
Telescope diameter D (m) 8 
Ai (um) 0.76 
Awrs (um) 0.9 
Flux (e7 /s/m?) 10° 
zenith distance (deg) 45 
Wind speed (m/s) 10 
ro at 0.5 ym (m) 0.133 
Lo (m) 25 
Number of turbulent layers 35 
Layer minimum height (m) 30 
Layer maximum height (km) 26.5 
Fraction of turbulence 0-5 km 85% 


Fraction of turbulence 0-13.5km 95% 
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The analytical error budget model in Eq. (27) provides the XAO residual halo 
contrast as a function of radial distance under the assumption of small aberrations 
(Eq. (2)). We assume that noise at a given distance is dominated by halo photon 
noise, and that the planet signal is proportional to the Strehl ratio. We then choose 
the signal to noise ratio at a distance of 40 mas, the maximum separation between 
Proxima Cen and Proxima b, as a figure of merit for the different considered XAO 
concepts. 

Figure 8 illustrates the different error terms and the overall residual halo for 
an optimized non-modulated pyramid wavefront sensor (PWS) AO system with 
32 actuators across the 8-m aperture, running at a loop frequency of 4 kHz with 
2 frames delay. The error budget is dominated by temporal delay and photon noise. 
The relative contribution of these two errors at 40 mas is well balanced. Increasing 
the loop frequency would decrease the temporal error, but increase the photon 
noise, leading to an overall degradation of contrast. Decreasing it would have a 
similarly adverse effect. Chromatic errors are minor contributors for the chosen 
science and wavefront sensing wavelengths. Only the refractive index chromaticity 
(Eq. (23)) starts to dominate at very small distances below the diffraction limit 
of \;/D. This error is therefore of little interest here, but becomes important to 
consider for extremely large telescopes with extremely high spatial resolution. The 
figure also shows the large contrast gain brought by the non-modulated PWS with 
its flat noise propagation over the Shack—Hartmann WF (SHS), which suffers from 
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Fig. 8. XAO residual halo and error budget from the analytical model for the case of Proxima 
Cen. 
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Table 2. Performance of the different XAO concepts for the case of 
Proxima Cen. 


WES type SHS PWS 

dact (m) 0.25 0.25 0.25 04 0.2 
fxao (kHz) 1 1 4 4 4 
Strehl ratio at 760 nm 0.56 0.6 0.62 0.44 0.67 
Contrast at 40 mas 3.2x 10-3 8.9 x 1074 1.5 x 1074 
SNR at 40 mas (a.u.) 1 2 5.1 3.7 5.6 


strongly degraded sensitivity for low spatial frequency aberrations.?* °° In order to 
balance temporal delay and photon noise, the SHS is run at a mere 1 kHz loop 
frequency. The residual halo contrast at 40 mas provided by the SHS is about 
20 times worse than that of the PWS. 

Performance estimates for SHS and PWS for different actuator spacings (dact) 
and loop frequencies (fxao) are summarized in Table 2. When both sensors are 
running at 1 kHz, the slightly better Strehl ratio and better contrast of the PWS 
doubles the achievable SNR at 40 mas. A major gain of a factor 5.1 is achieved 
by increasing the PWS fxao to an optimum 4 kHz. Changing dat for a 4 kHz 
PWS only affects the Strehl ratio through the fitting error but not the contrast at 
40 mas. The SNR does therefore not critically depend on the number of actuators for 
a reasonably small dat, and practical considerations like feasibility and affordability 
of the real-time computer can drive the design. In conclusion, an XAO system with 
32 x 32 actuators running a PWS at 4 kHz appears to be a good choice to attempt 
the detection of an Og atmosphere on Proxima b. 

The XAO error budget is now well understood. XAO residual wavefront errors 
produce a stellar intensity halo in coronagraphic images, which is the main source of 
photon noise in the NIR and ultimately limits the sensitivity of high-contrast imag- 
ing. In many cases, the AO temporal delay dominates the error budget. Running 
the AO system faster reduces the error, but will increase read-noise even for modern 
state-of-the-art detectors. Another approach to reduce temporal delay would be to 
predict the wavefront from past data. Such predictive controllers have been proposed 
in the literature.?° 4? With the greatly increased processing power and bandwidth 
of modern computers, predictive control use has been brought back into the focus of 
XAO engineering.**44 Computer modeling promises a substantial gain brought by 
the new methods, and let the detection and atmospheric characterization of Earth 
analogue planets by extremely large ground-based telescopes appear to be within 
reach very soon. 
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Solar adaptive optics (AO) systems are deployed at many major solar telescope 
facilities. The operational conventional AO systems in combination with post 
facto reconstruction techniques enable the ground-based observer to overcome 
the adverse effects of atmospheric seeing and achieve the diffraction limit of the 
solar telescope. The majority of solar observations are currently performed at 
visible and near-infrared wavelengths. As a consequence, solar AO systems have 
to provide a high-order of correction. This requires large systems, even though 
solar telescopes are of relatively small aperture compared to current night-time 
telescopes. Since its development, solar AO has enabled new ground-breaking 
scientific results. Solar AO is considered an enabling technology for new large 
aperture solar telescopes, such as the 4-m aperture Daniel K. Inouye Solar Tele- 
scope (DKIST) currently under construction on Maui, HI. These telescopes will 
obtain observations of the highly structured and dynamic solar atmosphere with 
the spatial resolution required to quantitatively explore the fundamental physi- 
cal processes. In this chapter, we discuss specifics of solar AO, in particular the 
wavefront sensing. We also introduce existing and future solar conventional and 
multi-conjugate AO (MCAO) systems. 


Adaptive Optics in Solar Observations 


The solar atmosphere is highly structured by the magnetic field that permeates 
the solar plasma. Magneto-hydro-dynamic (MHD) models of solar features, such as 
sunspots and granulation, reveal highly dynamic fine structure at scales of a few 


+The National Solar Observatory is operated by the Association of Universities for Research in 
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of the astronomical community. 


345 


346 T. Rimmele et al. 


tens of milli-arcseconds (a few tens of kilometers on the solar surface). Accurate 
quantitative measurements of physical parameters describing these features, such 
as temperature, velocity, strength and direction of magnetic field are needed to 
address the many mysteries that still exist in terms of understanding solar activity, 
including flares and coronal mass ejections or the impact of solar magnetism on the 
Earth’s climate. Solar AO is an essential tool for obtaining the required observations 
at large aperture solar telescopes. 

The principle and basic design of a solar AO system” is similar to other AO 
systems used for night-time astronomy or medical and industrial applications. One 
of the main challenges of solar AO compared to those other systems is that the Sun 
does not provide a point source for wavefront sensing. Scientific utility demands 


1 


that solar AO systems are able to lock onto readily accessible solar structure, such 
as solar granulation, sunspots and pores or even structure seen near the limb of the 
solar disk, such as prominences. In addition, the target structure evolves on time 
scales of minutes. A wavefront sensor capable of locking onto extended, low contrast, 
evolving sources is required. Although several concepts for such a wavefront sensor 
have been discussed, the only concept that has been successfully demonstrated so 
far is the correlating Shack—Hartmann wavefront sensor.” 

Other challenges include the poor and rapidly varying daytime seeing and the 
fact that solar astronomers mostly observe at visible wavelengths (as low as 380 nm). 
Due to heating of the ground by direct sunlight, the near-ground turbulence layer is 
much stronger during the day. Good to excellent seeing conditions are characterized 
by Fried parameters*® of order 10 cm (A = 500 nm). Such conditions can be found 
at excellent mountain or lake sites and at a typical telescope height of 20-40 m 
above ground. Due to the pronounced ground-layer turbulence the daytime Fried 
parameter often fluctuates significantly on short time scales (seconds). Imaging, 
spectroscopy and, in particular, spectro-polarimetry of the extended solar struc- 
tures in many cases lead to a requirement for achieving high Strehl observations 
(ideally >60%). In combination these boundary conditions drive solar AO systems 
requirements to high bandwidth and high spatial order of correction. For the largest 
solar telescopes such as the 4-m DKIST thermal control of tip-tilt and deformable 
mirrors becomes necessary. 

Building a solar AO system that robustly handles these challenges and provides 
high Strehl performance is already a formidable task. Multi-conjugate AO (MCAO) 
aims to extend the corrected field of view beyond the isoplanatic patch. Multiple 
deformable mirrors are deployed at conjugates of, e.g., strong turbulence layers in 
the atmosphere. The wavefront sensing problem becomes even more challenging 
since wavefront information has to be obtained in many directions on the sky. 
Development of solar MCAO has made significant progress in recent years, and 
operational systems are at the horizon. 


*See Chapter 14. 
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2. Wavefront Sensors for Solar AO 


2.1. Correlating Shack—Hartmann wavefront sensor 


A Shack-Hartmann sensor divides an image of the telescope aperture into sub- 
apertures using a 2D array of lenslets to sample the local wavefront slope.” In the 
solar AO case the lenslets form multiple images of a portion of the extended object, 
for example, solar granulation (Fig. 1). The relative displacements of these images 
are used to estimate the wavefront slopes in the sub-apertures. The image displace- 
ments are identified by digital image correlation. Cross-correlations between all 
sub-aperture images and one arbitrarily selected sub-aperture image, which serves 
as the reference, are computed. The (interpolated) position of the maximum in each 
correlation represents the relative displacements. The 2D cross-correlations closely 
resemble point sources, i.e., the solar wavefront sensing problem has been reduced 
to the wavefront sensing problem for point sources. 

Of order 20 x 20 pixels of a large format CCD or CMOS detector are used 
to sample each sub-aperture image. The field of view of a sub-aperture image is 
typically of the order of 10 x 10 arcseconds. The wavelength range for the wavefront 
sensors is often limited to a narrow bandpass 10—100nm wide to enhance the image 
contrast of the solar granulation. The dynamic range of the wavefront sensor is 
maximal if the reference is the central sub-aperture. 


Fig. 1. Correlating Shack-Hartmann wavefront sensor. The end-to-end simulator Blur? was used 
to simulate the Dunn Solar Telescope solar AO system AO76. The array of sub-aperture images 
imaged by the lenslet array onto the wavefront sensor camera are displayed on the left, while the 
cross-correlation functions are shown on the right. The granulation image is 10 x 10 arcsec with a 
Hanning window function* applied to avoid edge effects. A Fried parameter of 5 cm and realistic 
photon noise was used for this simulation. The sub-aperture size is 7 cm. The chosen reference 
image is displayed in the lower left corner. Local wavefront gradients are derived by computing 
the locations of the cross-correlation maxima. 


>See Chapter 9. 
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2.1.1. Image correlation techniques 


There are several ways to correlate two digital images J and R. The classical, unnor- 
malized cross-correlation can be computed using 


Ne 


Cxc(i, j,k, t) = yy Meine: R(x@+i,y+)), (1) 


z=ly=1 


with i,j =—L,...,0,..., +L, 


where I(x,y,k,t) is the momentary image in the kth sub-aperture, and R(a,y) = 
I(x,y,1, to) is the image previously captured in the reference sub-aperture r at time 
ty. The expected image shift is small and L = 3, L = 5, or L = 7 is sufficient, in 
particular once the control loop is closed. The sums in Eq. (1) are calculated over 
all nz columns and all n, rows in the sub-aperture image J. The reference image R 
must be at least of size (nz +L—1) x (ny+L—1) pixels. In order to save pixels in the 
detector, however, the optical images in the sub-apertures (including the reference 
sub-aperture) are usually only nz x ny pixels large, and R is enlarged accordingly 
with a circular shift. Alternatively, the cross-correlations can be computed using 


discrete Fourier transforms:® 


Corr (i, j,k, t) =F" {F{w(z, y) - I(x, y, k, t)} : conj(F{w(2, y) : R(x, y)})}- 
(2) 


Here, both J and R are nz X ny pixels large, and w is a nz X Ny window function, 
e.g., a Tukey window,* which can be computed as 


5 [t+eos( | -1))], o<n< SX 

[ ] 

w(n) = ¢ 1, oN cng [v- 1 p= 5) 
5 | +c0s ( ae +1})], [N yfi-S]<n<[w-]] 


where N = nz or N = n,. The 2D window is computed from 


w(x, y) = w(2) - w(y). (4) 


A useful apodization width for N = 20 or 24 is obtained with a + 0.38. Variations of 
the classical cross-correlation algorithm include the square difference function (SDF) 
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and the absolute difference function squared (ADF?): 


Cspr (i, J, k, t) =-S >> Gy, kt) -Ra@t+iy+sl, (5) 
x=ly=1 
Na Ny 2 
Capr2 (i,j,k, t) = — SoS l(a, y, kt) — Rl +i,y +7) ’ (6) 
«“=1y=1 


The ADF is given by Eq. (6) without the power of 2. It has been suggested that it 
may perform slightly better.® 

Pre-conditioning of the sub-aperture images J and R before the correlations 
are computed typically involves dark and flat image correction and normalization 
(mean or RMS) and may include removing linear trends across the image(s). De- 
trending the sub-aperture images avoids slow drifts that can occur in conjunction 
with reference updates.” 

Subpixel precision of the image shifts (s,, s,)(k,t) can be obtained by interpo- 
lating the position of the correlation peak. This can be done by a parabolic fit in a 
3 x 3 pixel neighborhood of the maximum pixel (@max,Ymax) Using 


82(k, t) = Imax — 


NlR 


4 C(Gmaxs Ymax; k, t) ~~ C(Gimax 7 1, Ymax; k, t) 
2C'(fmax; Ymax> k, t) ~~ C (Bima ~~ 1, Ymax> k, t) i C(@max a 1, Ymax> k, t) 
(7) 


for the x-direction and the accordingly altered form in the y-direction. Alternative 
interpolation methods are listed in Refs. 6 and 8. 

We note that the subpixel estimation is not exactly linear but shows a slight 
wave-like form, as exemplified in Fig. 2.69 The exact shape and slope of the response 
may depend on the pixel sampling, the correlation method, the interpolation method 
of the maximum and the image structure, similar to the centroid-gain in a quad-cell 
detector. These effects may become important for tomographic reconstructors in 
which open-loop data are calculated. 

Recently, the application of (linearized) matched filters in solar Shack— 
Hartmann sensors has been suggested to perform a maximum likelihood image shift 
estimation.!° This approach may be of particular interest for wavefront sensing with 
solar prominences off the solar limb in which the pixel signal-to-noise ratio is much 
lower (see Section 2.1.7). 
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Fig. 2. Monte Carlo simulation of the shift of a displaced PSF estimated by a quadratic fit of the 
Fourier method cross-correlation peak (black), and by the position center of gravity of the PSF 
(gray). Reproduced from Ref. 9. 


2.1.2. Correlation reference 


The image shifts in Eq. (7) represent the relative displacements from the reference 
image. In a real wavefront sensor, however, this does not represent the wavefront 
slope in the sub-aperture and the ideal zero positions z,(k) (corresponding to a 
flat wavefront) of the correlation peak in the kth sub-aperture must be measured 
experimentally and subtracted, such that 


81.(k,t) = 5,(k,t) — z2(k) + 2(r). (8) 


To define the positions z,(k), a pinhole can be placed into the entrance focus of 
the wavefront sensor and its image position in each sub-aperture identified. The 
term z,(r) cancels z,(k) if k = r is the reference sub-aperture; however, a more 
complex offset (such as temporal averaging) can be applied to define the reference 
for the tip-tilt error differently to avoid random image displacements when closing 
the tip-tilt control loop. 

A periodic update of the reference image is required since the target structure 
evolves. For example, granulation evolves on timescales of minutes. Hence, the refer- 
ence image is updated approximately once per minute. The quality of the reference 
image is critical for the performance of the wavefront sensor, and thresholding on 
the image contrast and shift with respect to the previous reference is often included 
in automated update procedures. In order to keep track of the tip-tilt error reference 
and to keep the tip-tilt mirror from being randomly displaced when the reference 
is updated, the shift between the old and the new reference image needs to be 
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considered such that 
sz (k,t) = s(k,t) + Sz. (9) 


The term S, is the shift of the current reference image with respect to the initial ref- 
erence image, i.e., it is the accumulation of all shifts between new and old reference 
images since the control loop was closed. 

If a separate tip-tilt sensor (correlation-tracker) is used for the image stabi- 
lization, the correlation reference can be used from the live image, i.e., R(z,y) = 
I(a,y,1,t). The tip-tilt signal seen by the Shack—Hartmann wavefront sensor must 
be discarded in this case. With a separate tip-tilt sensor, the image stabilization 
can run at a faster frequency than the wavefront correction because fewer pixels are 
needed. In most operational systems, however, there is no separate tip-tilt sensor, 
as modern wavefront sensor hardware can run at sufficiently high frame rates. 


2.1.3. Noise sources in a correlating Shack-Hartmann sensor 


While designing a correlating Shack—Hartmann wavefront sensor some fundamen- 
tal limitations and trade-offs have to be considered. As with any AO system, the 
subaperture size of the lenslet array projected onto the telescope aperture has to 
be of order r0 in order to minimize residual wavefront errors (fitting error) and 
obtain high Strehl correction. Daytime seeing conditions, characterized by small 
r0, would drive the design to small subaperture sizes of just a few centimeters. 
However, diffraction at these small apertures significantly reduces the contrast of 
the granulation images. The rms contrast, typically of order of a percent,'! and the 
noise background determine the signal-to-noise of the wavefront sensor, which is 
an important contributor to the overall system error budget! and hence the overall 
performance of the AO system. 

A first-order estimate of the signal-to-noise ratio of a correlating Shack— 
Hartmann wavefront sensor can be derived assuming wavefront sensor sub-aperture 
images are shifted versions of the exact same image (no seeing distortion effects 
within sub-apertures) .17-'* The variance of the image position, when determined 
by the centroid of the pixels in cross-correlation with values greater than the half 
maximum, was found to be 


o=5 E — ad ; (waves”), (10) 


where m, is the width of the auto-correlation peak of the reference image in pixels in 
the direction of x, nz is the sub-aperture image size in pixels, d is the sub-aperture 
size, f its focal length, A the wavelength, and p the pitch of the pixel array in the 
image sensor. In the case of Nyquist sampling, dp/(f \) = 1/2. o7 is the variance 
of the background noise and can be estimated using 


2 pd 2 2 2 
F = FPoisson Tread O dark O quantization eg (11) 
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where poisson = VN-— is the shot noise with N.- being the average number of 
collected photoelectrons per pixel. This term typically dominates Eq. (11) in a solar 
Shack—Hartmann sensor targeted on granulation and the charge capacity of the 
pixels becomes a critical attribute, whereas the read and dark-current noises of 
modern detectors are negligible and 8-bit quantization is sufficient if the exposure 
2 is the variance of the image brightness and Gms = 
di/Imean is the rms image contrast. Equation (10) was derived for low contrast 
objects like granulation, assuming similar electron counts in all pixels, i.e., No- & 


level is near saturation. 7 


Imean and consequently a; © Gms Ne-. Equation (10) then becomes 


m 1 dp : 
e =5 Se ve a (waves’). (12) 


rms e 


The image contrast C:ms depends on the object and is decreased by diffraction due 
to the finite sub-aperture size and by potential aberrations within the sub-aperture. 
The auto-correlation width m, represents the coherence length of the object struc- 
ture, and N,- is the level of the exposure. Granulation, the most common structure 
on the solar surface, has a typical intrinsic spatial scale of about one arcsecond. If 
imaged through sub-aperture sizes much smaller than 10 cm, diffraction blurs the 
granulation structure and the image contrast is significantly reduced, resulting in 
lower wavefront sensor signal-to-noise. A practical limit for how small sub-apertures 
can be made has been found to be about 7 cm.! In addition, wavefront sensor noise 
due to seeing distortion of sub-aperture images (79 smaller than sub-aperture size) 
was analyzed by Ref. 9 and was found to be a significant contributor, depending on 
seeing conditions. 

Furthermore, the sub-aperture image field of view must be limited to 10 x 
10 arcsec or less to avoid averaging of wavefront information along different lines 
of sight that experience different high altitude turbulence. Depending on the zenith 
angle and the severity of turbulence at high altitudes, even a wavefront sensor field 
of view of 10 x 10 arcsec can severely limit the Strehl performance when compared 
to a point source wavefront sensor.? On the other hand, the field of view has to be 
large enough to contain a sufficient number of granules for the correlation algorithm 
to work in a robust manner.!° If the wavefront sensor field of view is increased to 
several tens of arcseconds essentially a ground layer AO system is achieved.! 

When designing a solar AO system and given the various system performance 
driving parameters, a detailed systems error budget analysis has to be performed in 
order to find an optimum configuration.! In particular, this applies to the wavefront 
sensor, as one of the important subsystems. Strehl requirements, driven by the 
science requirements, and the site characteristics provide the boundary conditions 
for the system optimization process. 


2.1.4. Wide-field correlating Shack—Hartmann wavefront sensors 


Various applications, such as MCAO, Ground-Layer Adaptive Optics (GLAO), 
or turbulence profiling (e.g., see Refs. 16-19), require wavefront measurements in 


Solar Adaptive Optics 353 


Fig. 3. A wide-field, multi-directional Shack—Hartmann wavefront sensor with 19 guide regions 
that is equivalent to 19 separate narrow-field sensors pointed at different directions. (For the sake 
of clarity only six sub-apertures are shown to sample the wavefront.) 


multiple directions over a wider field of view. If the detector is large enough, the 
optical field of view of a Shack—Hartmann sensor can be increased to arbitrary sizes 
and divided into sub-regions of order 10 arcsec that are correlated throughout all 
sub-apertures, as shown in Fig. 3. Thanks to the Sun’s omnipresent granulation, an 
arbitrary number of “guide-regions” can be placed anywhere in the field of view. The 
correlating Shack—Hartmann sensor suffers from anisoplanatism. If a is the angular 
size of the guide-region, the extent of the area in a turbulent layer at distance 
h that is subtended by a sub-aperture with size dsubap is ds(h) = a+ h + dsubap- 
This indicates that a correlating Shack—Hartmann sensor becomes less sensitive for 
turbulence in higher altitudes. In turbulence profiling applications that typically 
analyze covariances of measurements, a may be made smaller than in an AO control 
system to mitigate this effect at the cost of increased measurement noise (Eq. (10)). 
Field sizes as small as 5.5 arcsec have been reported for turbulence profiling with 
the Sun.!” 

Wide-field multi-directional Shack—Hartmann sensors greatly simplify the com- 
plexity of wide-field wavefront sensing because only one optical path and one detec- 
tor are needed. This type of sensor has been implemented at various solar telescopes 
with apertures up to 1.6 m. This approach, however, seems currently unfeasible for 
the upcoming 4-m class telescopes due to the lack of suitable, large and fast image 
sensors. The application of wide-field multi-directional Shack—Hartmann sensors is 
of course not limited to solar imagery, but can in principle be used with natural 
and laser guide stars, too. 


2.1.5. Computer and camera hardware 


Computing correlations for a large number of sub-apertures requires not only 
substantial processing but also significant data I/O capabilities. Since all of the 
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aforementioned correlation algorithms (Eqs. (1), (2), (5), and (6)) have demon- 
strated satisfactory and nearly equal performance, the choice of algorithm is 
sometimes driven by hardware, software and cost considerations. In the real-time 
controllers (RTCs) of AO76 of the Dunn Solar Telescope (DST) and in the original 
AO308 (2013-2017) of the Goode Solar Telescope (GST), the image correlations, 
wavefront reconstruction, and the servo loop were computed entirely in digital sig- 
nal processors (DSPs). Other systems are entirely based on CPUs; these include 
the RTC in the Swedish Solar Telescope (SST), as well as the Kiepenheuer-Institute 
AO System (KAOS; Vacuum Tower Telescope [VTT], Sunrise) and KAOS Evo 2 
control software (GREGOR/KAOS.256.° GST/Clear.t GST/AO308 Mk T°). The 
RTC in the initial high-order AO system of the Daniel K. Inouye Solar Telescope 
(DKIST) is a hybrid system in which the correlations and the reconstruction are 
computed in FGPAs and the servo loop in a CPU. 

DST/AO76 and the DKIST high-order AO implement the classical cross- 
correlation (Eq. (1)). The SDF (Eq. (5)) and ADF (Eq. (6)) correlations have been 
introduced at the Swedish telescopes.?”?! KAOS and KAOS Evo 2 compute the 
cross-correlation in the Fourier domain; optionally, the latter can also compute the 
SDF correlation. 

Some CPU-based RTCs, e.g., KAOS, can place the correlation windows any- 
where on the camera frame, while some FGPA/DSP-based RTCs demand fixed, 
pre-defined positions. The latter typically demands a wavefront sensor with well- 
corrected internal image distortion, which may require a very complex optical design 
and tight alignment tolerances. Solar wavefront sensors usually deploy commercial 
CMOS cameras with streaming interfaces, often CameraLink or CoaXPress. 

In the age of multi- and many-core SIMD computer architectures, it seems 
worth noting that processing data from a Shack—Hartmann sensor is not a highly 
parallel problem. Even in the 1872-correlation wavefront sensor in the GST/Clear 
instrument, image data for no more than 48 correlations are transferred at any 
time. To minimize latency, a computer that is capable of processing the 48 corre- 
lations within their transfer time is sufficient if processing starts as soon as these 
data are available, without waiting until the full frame has been transferred. If 
implemented in well-optimized code, a single core of a contemporary workstation 
CPU can compute the image shift with the Fourier transform method in a 20 x 20 
pixel image in about 5 ys, including preconditioning and sub-pixel interpolation of 
the correlation peak. Further, when designing a CPU-based RTC, it is important 
to realize that on some architectures, such as Intel QPI and Intel UPI, systems 
with two CPU sockets may perform better in this application than those with four 
or more sockets, due to higher inter-processor bandwidths. CPU-based RTCs are 


°See Section 5.1 
See Section 5.2 
©See Section 5.1 
fSee Section 5.3 
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usually programmed in low-level programming languages such as C or C++ for 
maximum control of data layout, data flow, and memory management. Common 
optimization techniques should be applied (e.g., see Ref. 22). Well-optimized soft- 
ware libraries, e.g., FFTW3,?? OpenBLAS,”* Intel MKL,?° and Armadillo,?® which 
implement some generic functions needed in a AO RTC, are readily available. For 
low temporal jitter, it is important to prioritize the RTC tasks (e.g., Ref. 27). 


2.1.6. Optical design considerations 


Correlating Shack—Hartmann wavefront sensors for “infinite” scenes requires the 
use of a field stop. Further, the optics need to be designed such that this field is 
contained in the focal plane of the microlens array within the size of a microlens and 
that field stop images do not overlap with the images in neighboring sub-apertures. 
This requirement can be expressed by the relation 
aN, a os (13) 
fun cic2 2 
which generally characterizes the microlens array by the demanded angular size of 
the field of view (a), and either the telescope diameter (D) and number of sub- 
apertures across this diameter (c1), or the size of a sub-aperture (d).?” Here, aut 
is the side length of a square or hexagonal microlens, fp the focal length of a 
microlens, and 


1 for square microlenses, 
c2 = « V3 for hexagonal microlenses and a round field stop, (14) 


3/2 for hexagonal microlenses and a square field stop. 


From Eq. (13), it becomes clear that custom made microlens arrays are typically 
needed and that it is rather unlikely that stock arrays are compatible with a specific 
design requirement. This equation is valid in paraxial approximation. While the 
actual design of a wavefront sensor usually requires the use of raytracing software 
for accurate modeling, Eq. (13) provides a good estimate of the main properties 
of the microlens array, namely aj,;,/fm~_ = const, and can be used to identify 
the feasible parameter space of the microlens array pitch and focal length for the 
manufacturing with the vendor before designing the wavefront sensor in detail. 

In solar Shack—Hartmann sensors, the camera is usually not located in the focal 
plane of the microlens, but rather in a scaled image thereof, to match the required 
pixel scale. The corresponding reimaging optics are often designed with the option 
to image the pupil onto the camera for diagnostic purposes by moving a single 
element or the camera. 

Beamsplitters for wavefront sensors in some solar telescopes are designed such 
that the camera in the sensors always operate near saturation, even under low flux 
conditions (e.g., when targeting close to the solar limb at large zenith angles), and 
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variable neutral-density filters (of the same optical path difference) are inserted in 
the case of higher fluxes. 


2.1.7. Other uses of correlating Shack—-Hartmann wavefront sensors 


The correlating Shack—Hartmann wavefront sensor has been successfully demon- 
strated to function on off-limb solar structures such as solar prominences.?° How- 
ever, since prominences are seen only in the core of strong chromospheric lines such 
as the Ha line at 656.3 nm, a narrow-band (<0.07 nm) filter is required to image 
prominence structure onto the wavefront sensor camera. The light collection power 
of a small sub-aperture becomes a limiting factor and all light in the Ha band has 
to be directed to the wavefront sensor in order to achieve sufficient signal to noise; 
hence, the Ha band is no longer available for science instruments. The even fainter 
and often diffuse coronal structure is not a viable target for a correlating Shack— 
Hartmann wavefront sensor. Laser guide stars appear to be the only viable option 
for solar AO in the corona.?? Experiments with the goal of demonstrating the utility 
of sodium laser guide stars for coronal solar AO are progressing at solar telescopes 
on the Canary Islands.°° 

The correlating Shack—Hartmann wavefront sensor is also of interest for tracking 
extended (elongated) spots produced by laser guide stars (e.g., Ref. 31) and other 
extended objects such as satellites, or solar system planets. It is interesting to note 
that images of the retina of the human eye with its cone structure look very similar 
to images of granulation, which, in principle, would make this wavefront sensor 
approach also interesting for vision science applications.??:33 However, sufficient 
illumination of the retina is a problem and, hence, vision science AO systems project 
laser point sources onto the retina as wavefront sensing targets. 


2.2. Alternative wavefront sensors 


The correlating Shack—Hartmann wavefront sensor is currently the only routinely 
operated sensor type used in solar AO. The Sun is ~30 arcmin in diameter and its 
image is limited by field stops in any large solar telescope. Its practically infinite 
nature and its continuously changing surface exclude the application of Zernike 
sensors or pyramid sensors. Attempts to use curvature wavefront sensors for solar 
AO have not yet led to practical implementations. 


2.2.1. Optical wavefront differentiation for the Sun 


An optical-differentation wavefront sensor (Fig. 4) that is matched to the momen- 
tary image structure was proposed** and can be viewed as a generalization of the 
pyramid sensor for extended sources. The core of this sensor is a liquid-crystal dis- 
play (LCD) without polarizers that is placed in a focal plane. Two copies of the field 
of view with perpendicular polarization of the field are created side by side on the 
LCD with a Wollaston beam splitter. The pixels in the LCD are set to either rotate 
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Fig. 4. Basic setup of the optical-differentiation wavefront sensor. The mask modulates the image 
plane in such a way that the directional wavefront slope in the entrance pupil is coded in the 
intensity of the following pupil image. Reproduced from Ref. 35. 


the polarization by 90° or to not rotate, in a pattern that represents the x- and 
y-gradients of the image. A second Wollaston prism separates the photons accord- 
ing to their polarization after the LCD. The result is 4 polarized pupil images that 
encode the wavefront gradients in intensity, similar to the pyramid sensor, without 
any computational effort. The difference of intensity of each pupil pair yields the 
wavefront gradient in the according direction. It is worth noting that this sensor does 
not have sub-apertures that would decrease the image contrast on the LCD. While 
the pupil sampling is determined by the detector pixels, however, the maximum 
resolution in the pupil plane that can be obtained is probably not different from a 
Shack—Hartmann sensor for granulation (Section 2.1.3). Prototypes of this sensor 
were built (e.g., Ref. 35) but its development was not pursued intensively due to 
the success of the correlating Shack—Hartmann sensor, and a demonstration in a 
control loop is missing. The original motivation of this sensor was its computational 
simplicity compared to the correlating Shack—Hartmann. The wavefront derivative 
is obtained optically and only differences of the pupil images need to be computed. 
Like the pyramid sensor, this sensor principle could possibly be used in a layer- 
oriented MCAO. 


2.2.2. Plenoptic camera 


In a plenoptic camera, also referred to as light field camera, a microlens array 
is placed into a focal plane instead of a pupil plane (Fig. 5) and the pixel sen- 
sor is placed into the pupil plane created by the microlenses. The f-ratios of the 


358 T. Rimmele et al. 


(a) “net ) 


Microlenses 


f 
I 
_-f 
! 

l 

> 


Punctual or 
extended source 
(xy) 


Detector pixels 
Telescope pupil (uv) 
(uy) 


Fig. 5. (a) Conceptual scheme for the telescope setup of a plenoptic camera. The object is con- 
sidered to be at an infinite distance to the telescope, so all pupil points will see the same image, 
except for the effect of the turbulence layer. (b) Correspondence between pupil and pupil images. 
Every pupil coordinate is re-imaged on the corresponding position of each pupil image, depending 
on the arriving angle of the incoming ray. Reproduced from Refs. 36 and 37. 


microlenses are identical with the f-ratio of that focal plane to avoid overlapping 
pupil images. Such a camera can be used to identify the angle of incident light 
rays. Plenoptic cameras are used to select the focus and depth of field in digital 
images of objects at finite distances after the light field was taken. The use of a 
plenoptic camera as a wavefront sensor for point sources and solar images has been 
proposed.®* 3° In the case of extended objects, tomographic wavefront information 
can be extracted. 


3. Performance of a Conventional Solar Adaptive Optics System 


In order to demonstrate the typical performance of operational conventional AO sys- 
tems, we present sample imagery from AO76 on the Dunn Solar Telescope (DST). 
We chose the DST mainly because strictly simultaneous corrected and uncorrected 
images are readily available to us and processing pipelines are in place. Although 
details will vary, the main conclusions of the following discussion are equally appli- 
cable to other conventional systems. 

Figure 6 compares uncorrected and AO-corrected long exposure images of solar 
granulation with embedded magnetic features and bright points. Bright points are 
believed to be smaller than the diffraction limit of the DST (0.24 arcsec at 641 nm). 
Hence the FWHM of these features is close to the width of the PSF. While the 
AO-corrected image contains structure at the diffraction limit of the DST, all high 
spatial frequency information has been lost in the uncorrected image. AO76 is able 
to significantly improve the image quality across the 1 arcmin field of view, which is 
extremely important for the successful application of post facto image reconstruction 
techniques over the extended field of view. As is shown in Fig. 6 these techniques 
may not work well when applied to uncorrected data. The ability of AO76 to provide 
some correction beyond the isoplanatic patch is due to the fact that daytime-seeing 
is dominated by near-ground turbulence. 
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Fig. 6. AO corrected (a) and uncorrected (b) long exposure (7.5s) images were produced by 
averaging 250 short (30 ms) exposures. The images were recorded at a zenith angle of about 50° 
and in 1 arcsec seeing. The wavelength is 641.3 nm. (c) Lock point field of view reconstructed with 
PSF estimate from AO telemetry data. (d) Speckle reconstructed image of same field of view using 
the sequence of short exposure, AO-corrected images. (e) Corresponding speckle reconstructed 
image using uncorrected data. The speckle reconstructed image has slightly higher contrast (15%) 
than the reconstructed long exposure image (14%). However, within less than 10% the two images, 
reconstructed with two very different methods, are essentially the same. It should also be noted 
that the speckle reconstruction based on uncorrected imaging data clearly fails to recover the 
object. 
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Although the AO retains structure at the diffraction limit, the correction is only 
partial. As a consequence of the seeing halo of the PSF, the contrast of the extended 
object is diminished. The telemetry data of AO76 (mainly wavefront sensor residu- 
als and deformable mirror commands) can be used to estimate the PSF and Strehl 
ratio achieved during the exposure.4?:4! The PSF estimate, which is valid at the 
AO lock point only, was used to deconvolve the long exposure image with the goal 
of restoring the correct amplitudes (Fig. 6(c)). A Strehl ratio of 48% was achieved 
by AO76 for this image. The speckle reconstructed image of the lock point area is 
displayed in Fig. 6(d). In this particular example, the long exposure reconstruction 
(Fig. 6(c)) and the speckle reconstructed image (Fig. 6(d)) contain essentially the 
same information. However, in cases where the AO delivers a low Strehl ratio due to 
unfavorable seeing conditions, dynamic, residual aberrations can result in a reduced 
signal-to-noise ratio of the high spatial frequency information content in long expo- 
sures. This makes it difficult to recover all spatial frequencies in the noise-sensitive 
deconvolution process. In contrast, the same aberrations are “frozen in” in short 
exposures. This fact allows their removal using post facto reconstruction techniques 
that operate on a sequence of short exposure images. However, not all instruments 
lend themselves to acquisition of short exposed images, and deconvolution using a 
long exposure PSF estimate from AO telemetry is a valuable tool to improve the 
scientific utility of those data. 


3.1. Generalized Fried parameter and image reconstruction 


Quantitative analysis of the performance of a solar adaptive optics systems directly 
from images is not straightforward because of the lack of reference point sources. 
A strategy to obtain a quantitative performance metric is to measure the appar- 
ent increase in Fried’s parameter*? in the AO-corrected images. The measurement 
of an entity analogous to the atmospheric Fried parameter using adaptive optics 
corrected data is justified when assuming that the phase structure function D(r) 
can be approximated for small r as D(r) = 6.88(r/po)°/*, where po is defined as 
the“generalized Fried parameter” .*? 

A technique to estimate Fried’s parameter from uncorrected image data 
was suggested,*4 and involves the computation of the spectral ratio SR(q) = 
(Z:(q))|?/(\Li(q@)|?), where I;(q) represents the value of the Fourier transform at 
spatial frequency q of the ith image J; in a series of N images, 7 = 1,...,N, and 
(,..-,) is the average over 7. Fitting the spectral ratio derived from the series of 
un-corrected, short exposure images with models for the long and short exposure 
transfer functions of the atmosphere 47’*° yields an estimate for the value of Fried’s 
parameter. Application of the same technique to images observed with adaptive 
optics correction will generate an estimate for the generalized Fried parameter. 

Figure 7 shows speckle reconstructed images*® using the uncorrected and AO- 
corrected image sequences, respectively. The estimated Fried parameter and gen- 
eralized Fried parameters are overlaid in color. As expected, the Fried parameter 
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Fig. 7. Speckle reconstructions of images seen in Figs. 6‘a) and 6‘b). Overlaid in color is the esti- 
mated generalized Fried parameter (AO corrected, (a)) and the Fried parameter (uncorrected (b)), 
respectively. 


determined from the uncorrected sequence is uniform across the field of view. It is 
also clear that the post facto speckle technique applied to uncorrected images is not 
able to reconstruct at the diffraction limit and introduces artifacts (e.g., localized 
high-frequency wave patterns), i-e., (partial) AO correction is essential to ensure 
the robustness of the post facto technique. The generalized Fried parameter from 
adaptive optics corrected images makes apparent very clearly the location of the 
AO lock point and the extent of the isoplanatic patch. We also note that the AO 
improves image quality across the entire field of view with best correction at the 
lock point. These results demonstrate the power of combining adaptive optics and 
post facto reconstruction techniques. Other post facto reconstruction techniques that 
can be used in combination with AO and that are commonly used include phase- 
diversity,4’ multi-frame blind deconvolution, and multi-frame multi-object blind 


deconvolution.*® 


4. Multi-Conjugate Adaptive Optics for Solar Observations 


MCAO is an advanced AO scheme to enlarge the corrected field of view. In MCAO, 
multiple deformable mirrors, each conjugate to different distances on the optical 
axis, are deployed to correct the turbulence in three dimensions.*% °° The turbulence 
volume above the telescope is measured using tomographic wavefront sensing. While 
conventional AO systems at solar telescopes revolutionized solar observations and 
have been enabling diffraction-limited resolution, a corrected field of view of 1-2 
arcminutes is needed to study some large-scale effects on the Sun. MCAO was first 
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Fig. 8. Solar granulation corrected with MCAO, GLAO and conventional AO by Clear on the 
Goode Solar Telescope (a), and the corresponding generalized Fried parameters in cm (b). Repro- 
duced from Ref. 55. 


proposed for the Sun in 1987,°! and initial, pioneering on-sky experiments were 
performed in the mid-2000s at the 70-cm class VT'T and DST.°?:>? Motivated by 
the results, second generation, experimental MCAO systems were built for the 1.5-m 
class GREGOR®™ and GST (then NST). The MCAO pathfinder Clear (Section 5.2) 
on the GST marks the culmination of a more than a decade-long development and 
demonstrated the superiority of multi-conjugate wavefront correction with three 
deformable mirrors over correction with a single mirror. In Fig. 8, it is clearly 
visible that MCAO correction provides the sharpest image in a field of view as wide 
as 35 arcsec, while conventional correction is limited to about 10 arcsec. Ground- 
layer-only correction in this particular example provided a good correction that is 
smooth across the field of view but not as good as MCAO or conventional AO in 
the center. The usefulness of GLAO strongly depends on the momentary turbulence 
profile. 


4.1. Meta-pupil coverage with solar-telescope-size apertures 


In order to reconstruct the 3D atmospheric turbulence, the meta-pupils at higher 
turbulence layers must be well sampled by the footprints of the guide regions and suf- 
ficient overlap of these footprints must be ensured.°® At solar telescopes up to 1.6 m 
diameter, wide-field multi-directional Shack-Hartmann sensors (see Section 2.1.4) 
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Fig. 9. Coverage of the meta-pupil in 8 km distance for telescopes with 8, 4, and 1.6m diameter 
mirrors with different guide star numbers and separations. The figures in (a) assume five guide 
stars placed at the centers of the circles, while the figures in (b) assume nine guide stars in an 
evenly-spaced 3 x 3 pattern as shown. The bottom right panel assumes a 30 x 30 arcsecond field 
of view; the other panels have a 60 x 60 arcsecond FOV. This figure does not include the effect of 
the extended field of view in a correlating Shack—Hartmann sensor. 


have been used to simplify the setup and to allow for arbitrary numbers of guide- 
regions. As exemplified in Fig. 9, a greater number of guide-regions is needed for 
smaller telescopes to probe the turbulent volume equally well as with larger tele- 
scopes. While only five sampling directions are needed to probe a field of view of 
60 x 60 arcsec with an 8-m telescope,®” this number does not provide sufficient 
coverage and overlap with a 1.6-m telescope, and even for a 4-m telescope nine 
directions are needed for similar coverage. 


4.2. Pupil deformation in MCAO 


One should keep in mind that in any adaptive optics telescope, the optical path 
inside the telescope gets aberrated by the deformable mirros (DMs) as they correct 
external turbulence. This affects any image of objects in the telescope before the 
DMs. In MCAO, one or more DMs are not conjugate to the pupil. When such 
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a mirror changes its shape, subsequent pupil images get aberrated; in particular, 
their shape gets distorted and their position displaced.?” °° The distortion and dis- 
placement depend on the gradient of the subtended part of the deformable mirror. 
For instance, the on-axis exit pupil gets compressed in one axis and stretched in 
the perpendicular axis if a high-altitude conjugate mirror assumes an astigmatism 
shape. If illuminated by an extended field of view, as in a solar telescope, the actual 
pupil image is the superposition of all pupil images created by all points in the field 
of view. Since each field point subtends a different area on high-altitude mirrors and 
produces its own distorted and shifted pupil image depending on the mirror shape, 
the resulting wide-field pupil image is blurred in a way that changes with the mirror 
shape. Consequently, edge sub-apertures, when small, can suffer from significant and 
random vignetting in closed-loop operation that may create highly degraded, elon- 
gated point-spread functions up to a level where granulation is effectively blurred 
away and useful wavefront slope measurement is not possible. For the stability of the 
control loop, it seems best to simply ignore any partially illuminated sub-apertures 
or sub-apertures in the wavefront sensor that are close to edge of the pupil.°? 

The pupil in MCAO-corrected science instruments should be stopped down 
for two reasons: The first reason is to remove the portion of the pupil that is left 
uncorrected due to the ignored edge sub-apertures.°? The second reason is to avoid 
very significant local intensity fluctuations in the MCAO-corrected image plane 
caused by the changing shape of deformable mirrors conjugate to high altitudes. 
For a 1.6-m telescope, stopping the exit pupil down to about 1.42 m suppresses 
the fluctuations.°? When specifying a new telescope with MCAO that shall provide 
a certain effective aperture diameter, the primary mirror needs to be oversized 
accordingly to account for the stopping down. 


4.3. Considerations for the order of deformable mirrors in MCAO 


The deformation of the pupil image caused by DMs conjugate to high altitudes, 
as explained in Section 4.2, is of course not limited to the edge of the pupil. If a 
pupil conjugate deformable mirror is located before and a wavefront sensor behind 
any high-altitude mirror, the image positions of the pupil conjugate actuators on 
the sensor’s sub-aperture array will be shifted, too, as the high-altitude conjugate 
changes its shape. This implies dynamical misregistration of the actuators’ responses 
in the wavefront sensor when the control loop is closed. It has been shown that, in 
such a configuration, the response in the wavefront sensor is nonlinear, as it is 
no longer is given by the linear superpositions of single responses but depends on 
the product of the value of two or more different actuators in different mirrors.°° 
A strategy to avoid this kind of misregistration is to place the pupil conjugate mirror 
after any high-altitude conjugate, so that the optical path between this mirror and 
the wavefront sensor remains constant. This way, however, the correction order is 
theoretically not ideal.®! While this approach has been successfully demonstrated 
with Clear, research is ongoing to identify the best compromise and to quantify 
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the impacts on the wavefront correction capability of a practical solar closed loop 
MCAO system in either configuration. 


4.4. Lateral alignment and cross-talk in MCAO 


In a both single- and ground-conjugate configuration, the lateral aligning of the 
wavefront sensor and the deformable mirror is a straightforward task. Usually, the 
microlens array is mounted in a two-axis linear stage and its position is tuned to 
match with the actuators, for example, in the Fried geometry in which the actu- 
ators coincides with corners of sub-apertures. For a high-altitude mirror, there is 
no general matching between actuators and sub-apertures. Modal reconstructors 
are common in MCAO, and often modes (e.g., Karhunen—Loéve) are used to com- 
mand the deformable mirrors. It is important that the modes on each mirror are 
centered with each other and with the wavefront sensor(s). If this is not the case, 
any mode produced by a decentered mirror will be accompanied by a tip-tilt signal 
in the wavefront sensor, forcing the tip-tilt mirror to counteract. Consequently, the 
bandwidth of the tip-tilt control drops, and it seems likely that other modes may 
suffer from similar cross-talk. If the DMs are large enough, the modal bases could 
theoretically be defined on any subset of the mirror’s aperture. To avoid asymmetric 
actuator commands for otherwise symmetric modes, however, the mirrors need to 
be centered mechanically with the wavefront sensor. In Clear, the high-altitude 
mirrors are mounted on two-axis stages that are parallel to the mirror surface. To 
check the lateral alignments of all deformable mirrors, a parabolic shape is oscillated 
on each mirror at a different frequency. The power spectra of the x- and y-tip-tilt 
signals as seen by the wavefront sensor reveals the decentering of each mirror with 
respect to the wavefront sensor(s). If decentered, the power spectra show a peak 
at the corresponding frequency. The pupil conjugate deformable mirror is usually 
fixed and it directly or indirectly defines the lateral position of the exit pupil for 
the instruments. Consequently, the microlens array in the wavefront sensor(s) needs 
to be adjusted first to remove the peaks in the power spectra associated with this 
mirror. Next, the lateral positions of the high-altitude mirrors can be adjusted 
similarly in arbitrary order. 

In an MCAO system with more than one high-altitude deformable mirror, it is 
critical to prevent those mirrors from fighting against each other in closed-loop oper- 
ation. In modal reconstructors, this can be done by identifying critical modes and 
filtering accordingly. For example, only one high-altitude mirror should be allowed 
to assume the shape of a parabola. 


5. Examples of Solar AO Systems 


Conventional AO systems have been the key technology to obtain diffraction-limited 
observations at major ground-based solar facilities. Some of these systems have 
been operational for over a decade now. Table 1 summarizes the systems that are 
currently routinely operated. 


Telescope 


Dunn Solar 
Telescope/ 
AO76 

Vacuum Tower 
Telescope/ 
KAOS.35 
GREGOR/ 
KAOS.256 


Swedish Solar 
Telescope 


Goode Solar 
Telescope/ 
AO308 

Goode Solar 
Telescope/ 
AO308 Mk II 
Goode Solar 
Telescope/ 
Clear (MCAO) 
Domeless Solar 
Telescope 


New Vacuum 
Solar Telescope 


Daniel K. Inyoue 
Solar Telescope/ 


HOAO 


Aperture 


0.76 m 


0.7 m 


1.44m 


1.0 m 


1.6 m 


1.6 m 


1.6 m 


0.6 m 


1.0 mm 


4.0 m 


Table 1. 


Sub-ap. 


7.6 cm, 
76, 
square 

10 cm, 
36, 
hexagonal 
9.6 cm, 
156, 
square 

10 cm, 
85, 
hexagonal 
8 cm, 
308, 
square 

8 cm, 
308, 
square 
8.8 cm, 


1872 (9x 208), 


square 
6 cm, 

56, 
square 
8.3 cm, 
102, 
hexagonal 
9.3 cm, 
1500, 
square 


Image correlation 


20x 20 px, 0.5’”/px, 
direct x-corr, 
462-477 nm 

24 24 px, 0.5’”/px, 
DFT, 

500-510 nm 

24x 24 px, 0.5’ /px, 
DFT or SDF, 
500-510 nm 

24x 24 px, 

ADF or SDF 


16x16 px, 0.62” /px, 
direct x-corr, 
512-537 nm 

20x 20 px, 0.6’’/px, 
DFT or SDF, 
512-537 nm 

20x20 px, 0.63’ /px, 
DFT or SDF, 
512-537 nm 

25x25 px, 0.57” /px, 


420-480 nm 

12x10 px, 1” /px, 
ADF?, 

400-600 nm 

20x20 px, 0.5’ /px, 
direct x-corr, 

= 475-575 nm 


Camera 


in-house, based 
on Photobit PB- 
MV13 sensor 
Dalsa 

1M150 


Mikrotron 
EoSens CL/3CL 


Mikrotron 
EoSens CL 


Vision Research 
Phantom v7.3 


Adimec 
Q-2HFW 


Mikrotron 
EoSens 3CXP 


Photron 
Vision Research 
Phantom V311 


Vision Research 
DS-440 


Recent AO systems at major solar telescopes. 


DM 


Xinetics 
SN PMN, 
97 act. 
Laplacian 
bimorph, 
35 act. 
CILAS 
SAM, 

256 act. 
CILAS 
monomorph, 
85 act. 
Xinetics 
SN PMN, 
357 act. 
Xinetics 
SN PMN, 
357 act. 
Xinetics 
SN PNM, 
3 x 357 act. 
ALPAO 
magnetic, 
97 act. 
IOE, CAS 
PZT, 

151 act. 
Xinetics 
SN PMN, 
1600 act. 


RTC 


2500 Hz, 
DSPs 


2100 Hz, 
CPUs 


2100 Hz, 
CPUs 


2000 Hz, 
CPUs 


2200 Hz, 
DSPs 


1875 Hz, 
CPUs 


1000 Hz 

(1568 Hz), 
CPUs 
1100-1400 Hz, 
CPUs 


3500 Hz, 
FPGAs, 
DSPs 
1975 Hz, 
FPGAs, 
CPUs 


Year 


2002 


2002 


2011 


2011 


2012 


—2018 


2018 


2016 


(2018) 


2011 


2015 


2019 


99€ 
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5.1. KAOS.256 on GREGOR, and AO308 on the Goode 
Solar Telescope 


The 1.6-m Goode Solar Telescope (GST, formerly New Solar Telescope) — the 
world’s largest operational solar telescope — and the 1.5-m GREGOR, which were 
commissioned in recent years, were motivated and enabled by the success of the AO 
retrofits on the tall 70-cm class vacuum telescopes of the past century. Thanks to 
AO, these new telescopes did not need to be evacuated. This is essential because 
the maximum aperture of an evacuated telescope is said to be limited to about 
one meter due the stress in the entrance window. Having a diameter approximately 
twice as large as the older telescopes, roughly four times the number of actuators 
are needed. Unlike large night-time telescopes, these new solar telescopes would be 
of limited scientific use without well working adaptive optics. Consequently, obser- 
vations without AO are rare and are usually only carried out at wavelengths above 
1 ym and if the seeing is good. An example of the highest resolution solar imagery 
available to date is shown in Fig. 10. GST is an off-axis clear aperture telescope and 
its conventional AO system, AO3808, sports 308 sub-apertures — each 8 cm large — 
and a DM with 357 actuators. The sub-apertures in GREGOR’s KAOS.256 measure 
9.6 cm and due to the smaller aperture and the central obstruction the total number 
of sub-apertures is 156 sub-apertures. The DM is made of 256 actuators. The control 
frequencies of AO308 and GAOS.256 are of order 2 kHz to match daytime seeing 
conditions and visible wavelengths. The heart of KAOS.256 is the control system 
KAOS Evo 2, a flexible and powerful CPU-based AO control system. Since the 
upgrade to AO308 Mk IT in 2018, the GST system is also running on KAOS Evo 2. 


Fig. 10. Speckle reconstruction based on AO308 corrected short exposure images from the BBSO 
1.6-m Goode Solar Telescope. Wavelength 706 nm. Courtesy W. Cao. 
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5.2. Clear — The MCAO pathfinder on the Goode Solar Telescope 


Clear is the MCAO pathfinder on the GST for a future MCAO upgrade of the 
DKIST. As such, and because solar MCAO was still in an early stage at the begin- 
ning of this project in 2013, Clear was designed to provide maximal experimental 
flexibility in order to quickly advance solar MCAO. Clear offers a variety of dif- 
ferent wavefront sensing schemes for testing, and three deformable mirrors whose 
conjugate positions are easy to change. Its flexibility is also reflected by the entirely 
CPU-based control system KAOS Evo 2. 

“Genetic development” has been the philosophy behind Clear, meaning that 
many different ideas are tested with the aim to identify the most promising approach 
first. The team delayed any efforts on issues that are only relevant for the long-term 
stability of the MCAO control loop, but concentrated on finding an approach that 
provides a well-corrected image in an extended area, even if only for a few seconds. 
This happened in Summer 2016, as shown in Fig. 8,°° When those pictures were 
taken, all three deformable mirrors were engaged and conjugate to 8, 3, and 0 km 
and ordered in this sequence. A wide-field Shack—Hartmann sensor — located after 
all deformable mirrors — was used, sporting nine 10-arcsec guide-regions in a 3 x 3 
array spanning about 35 arcsec, with 208 sub-apertures (8.8 cm each). With a total 
of 1872 cross-correlations, this is the most complex solar Shack—Hartmann sensor 
ever built. In this configuration, the control loop frequency was initially limited to 
1000 Hz by the processing power of the RTC. More recently, the computer has been 
replaced by a model than can run the control loop at 1500 Hz, which is the maximum 
frame-rate of the wavefront sensor camera. All three deformable mirrors of Clear 
were polished with bias voltage, applied for about 4 nm RMS surface error. Clear is 
being operated regularly to advance and to mature MCAO for solar observations. 


5.3. DKIST high-order adaptive optics system 


An example of current conventional AO development is the AO system for the 
4-m DKIST on Haleakala on the Island of Maui, Hawaii. Once operational in 2020, 
DKIST will be the world’s largest solar telescope. Achieving the diffraction limit of 
DKIST for visible and near infrared wavelengths mandates a high-order adaptive 
optics system well matched to local seeing conditions. Requirements for AO perfor- 
mance in terms of achieved Strehl ratio were derived through analysis of the science 
drivers. The system requirement of 60% Strehl ratio in excellent seeing conditions 
(ro = 12cm at science wavelength 630nm), and 30% Strehl ratio in median seeing 
conditions (79 = 7cm at science wavelength 500nm) drive the system error budget 
analysis and subsequent design and implementation. The high-order AO system is 
fully integrated with the other DKIST wavefront correction systems, such as AO 
and telescope alignment functions.” 

A problem specific to solar telescopes is the heat flux absorbed by both the 
tip/tilt and deformable mirror devices. At DKIST, the absorbed heat flux for each 
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of those two mirrors is about 100 W/ m’. Therefore, thermal control is required to 
avoid local seeing near the mirror surfaces. Wavefront tilt across the full aperture 
is off-loaded to a 20cm diameter, actively cooled tip-tilt mirror operating with a 
closed-loop bandwidth of 160 Hz. Higher-order aberrations are corrected with a 
stacked actuator DM with 1600 actuators.”? The actively cooled faceplate of the 
deformable mirror has been demonstrated to be capable of achieving a flatness 
of ~4nm rms. The correlating Shack—Hartmann wavefront sensor has about 1500 
sub-apertures and is matched to the 1600 actuators of the deformable mirror, imple- 
menting the Fried geometry.”4 The wavefront sensor receives only 4% of DKIST’s 
broadband light. The remaining 96% is directed to the science instrumentation of 
the facility. With 43 sub-apertures across the 4m aperture diameter the resulting 
sub-aperture size is about 9cm. The field of view of 10x10 arcsec is sampled spa- 
tially at 0.5 arcsec/pixel (slight spatial over-sampling at 500nm) with a high-speed 
(>2kHz), large format (1k x 1k pixels), off-the-shelf CMOS camera. The significant 
computational effort involved in computing 1500 cross-correlations every 500 ps is 
performed using two Field-Programmable Gate Arrays that are programmed to 
deliver the two-dimensional wavefront slope in each sub-aperture. The shift vector 
is multiplied with one of several control matrices that can be selected dynamically 
and automatically to allow the system to adapt to changing seeing conditions by 
using an optimal number of control modes. The system control matrices are pre- 


75-78 some of which optimize for the layout of 


computed using common strategies, 
the 1600 actuators of the deformable mirror that will be employed for the correction 


of the wavefront. 


6. Striving for Broader Fields — Solar Adaptive Optics the 2020s 


Classical, single conjugate adaptive optical systems revolutionized ground-based 
solar observations and were adopted quickly in the early 2000s at most major solar 
telescopes once the technology was successfully proven in the late 1990s. A large 
number of solar studies using AO has been published since then. The vast majority 
of observations at large solar telescopes, today and even more so in the future, 
depend on AO. Imaging filtergraphs with AO, as the enabling technology, were 
the first light instruments of the newest and largest solar telescopes GST, and 
GREGOR. Classical AO will also be essential for DKIST’s first light observations. 
Due to the significantly larger corrected field of view, one can expect the recently 
demonstrated solar multi-conjugate adaptive optics to play a vital role in solar 
observations in the near future. Clear, the only operational astronomical MCAO 
system with three deformable mirrors on-sky, has been a critical pathfinder and is 
becoming available for scientific observations of the Sun. The planning and design 
work for the MCAO upgrade of DKIST is in progress. In addition, other adaptive 
optics concepts are considered for use in solar astronomy. Wide-field adaptive optics 
concepts, such as ground layer AO, might by applied to full-disk solar telescopes to 
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improve the seeing for synoptic observations. Recent developments aim at moving 
adaptive optics off the solar disk. First observations of solar prominences off the 
solar limb using adaptive optics successfully have already been made. In the future 
we might see laser guide stars — a technology that was often considered to be not 
needed on a solar telescope — enabling adaptive optics for observations of the solar 
corona. 

Adaptive optics has been a dependable and a mature technology for high spatial 
resolution observations of the solar disk for more than a decade. The Sun is just one 
star, but thanks to its proximity, we can study the Sun and, in particular, the Sun’s 
magnetism and how it impacts Earth at a level of detail that is not possible for any 
other star. Future adaptive optics are expected to be more versatile and to support 
a greater variety of solar observations. While most AO solar telescopes today feature 
only one, a classical, wavefront sensor, we anticipate MCAO to become standard 
equipment at all large aperture telescope. In addition, a number of specialized wave- 
front sensor systems, such as prominence AO sensors, will be implemented at solar 
telescopes in the upcoming decade. The adaptive optics revolution in solar physics 
continues. 
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Chapter 1 


Speckle Interferometers 


Elliott P. Horch* 


Department of Physics, Southern Connecticut State University 
501 Crescent Street, New Haven, CT 06515, USA 


horche2 @southernct. edu 


Speckle interferometers are imaging systems used with large telescopes that allow 
the observer to record diffraction-limited image information through atmospheric 
turbulence. To do this, they use optics to magnify the image over what the tele- 
scope delivers and a detector or detectors that read images out rapidly to match 
the rate of atmospheric fluctuations above the telescope. The resulting high- 
resolution images or image products are then used for a variety of science objec- 
tives. This chapter discusses mainly the design, construction, and data reduction 
principles of these devices, with some comments on science applications at the 
conclusion. 


1. Introduction 


As light from a star arrives at the top of the atmosphere of the Earth, it may be 
thought of as a sequence of planar wavefronts. If left undisturbed by the atmo- 
sphere, this electromagnetic radiation would be brought to a focus by a telescope 
so as to produce a diffraction-limited image. However, atmospheric turbulence cre- 
ates temperature and density variations in the air above the telescope, and this 
in turn results in variations in the index of refraction. Small optical path length 
differences are created as the light travels through the atmosphere to the telescope, 
and therefore the wavefronts that enter the aperture are no longer planar, but rough 
or corrugated. The typical length scale over which the phase of the incoming wave- 
front is coherent is 10-30cm, depending on the wavelength of observation, seeing, 
and other factors. The effect of this is to break the aperture up into sub-apertures 
which have roughly uniform phase, assuming the telescope diameter is much larger 
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than the coherence length. Any two such sub-apertures would produce a fringe 
pattern on the image plane where the fringe spacing is inversely proportional to 
the separation of the sub-apertures. Taken together, all of the sub-aperture pairs 
produce a complicated interference pattern with many local maxima and minima 
in irradiance over a region of the image plane determined by the size of the sub- 
aperture cells. This image is known as a speckle pattern. More detailed discussions 
regarding the physical nature of speckle images in astronomy are given by, e.g. 
Refs. 1 and 2. 

In normal astronomical imaging, the resolution of the images obtained is deter- 
mined by the astronomical seeing, that is, the full width at half maximum (FWHM) 
of the width of a point source in arcseconds. Generally, these images do not exhibit 
a speckle nature, although the size of the seeing disk is comparable to the size of the 
envelope of speckles for a speckle image. Speckles are not seen in normal imaging 
for two reasons. First, the timescale of the exposure is usually long compared to 
the lifetime of speckles on the image plane. Since the timescale of atmospheric 
fluctuations above the telescope aperture is in the range of one to a few 10s of 
milliseconds, one must record images at least as fast as this to have a chance to see 
speckles. Second, given that nearly all applications in astronomy use digital cameras 
to record the images, there is no value in oversampling the seeing disk beyond a 
certain point. Typical magnification of the image by the telescope’s optical system 
would result in on the order of five pixels across the FWHM of the stellar image. For 
a large telescope however, the size of individual speckles is much smaller than the 
FWHM; it is on the order of the diffraction-limited spot size. Thus, the sampling 
would not permit the resolution of individual speckles. 


2. Instrument Requirements 


2.1. Optical Requirements 


Based on the above, it is clear that a fundamental optical requirement is the mag- 
nification of the image to an appropriate scale to observe and record individual 
speckles. On the other hand, one would like to maintain as large a field of view as is 
practical, and so overmagnification of the speckles could result in a very small field 
of view on the imaging detector if its pixel format is not large. To accommodate 
both requirements, a typical choice is to magnify the speckles to the point where 
the FWHM of these features is about two pixels wide, to achieve what is known as 
critical sampling. This takes advantage of the fact that speckle patterns are band- 
limited functions and so their Fourier transforms contain no signal outside some 
radius from the origin in the Fourier plane. Critical sampling ensures that all of the 
Fourier components of a speckle image are recoverable, a key requirement in image 
reconstruction. 

In addition to achieving an appropriate magnification, the optical path of a 
speckle interferometer normally contains narrowband pass filters. Because the index 
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of refraction varies with wavelength, the corrugation of the wavefronts entering the 
telescope aperture also varies with wavelength, so that the speckle patterns formed 
at different wavelengths would not have speckles in the same locations and would 
nonetheless overlap on the image plane, decreasing the contrast of speckles. Nar- 
rowband pass filters keep the wavelength range small enough so that good speckle 
contrast is achieved. 

The wavelength band pass transmitted by such a filter can vary with incident 
angle of the beam. A good way to mitigate this while also providing the desired 
magnification on the image plane is to use a two-lens system with the filter in 
between, as shown in Fig. 1. Here, the input side of the optics has a short focal 
length lens positioned one focal length from the telescope focus, which produces a 
collimated beam. This collimated beam traverses the filter with incident angle of 
zero degrees, and is then made to converge by a second lens of longer focal length. 
The magnification of the system is then the ratio of the larger focal length to the 
smaller one. If a particular speckle interferometer is to be used at more than one 
telescope, different magnifications would in general be required, and so different 
combinations of lenses may be mounted on slides or wheels to accommodate such a 
change. 

When dealing with highly magnified stellar images, atmospheric dispersion is a 
concern, especially if the angle between the zenith and the star position is sizable. 
While the effect on a seeing-limited image is often not very noticeable, individual 
speckles can be significantly elongated along a line leading to the zenith, even when 
a narrowband pass filter is used. For this reason, another pair of optical elements 
is sometimes used in speckle interferometers to compensate for this effect: Risley 
prisms, also known as zero deviation prisms. A Risley prism consists of two wedges 
of glass made of materials of different index of refraction, where the wedge angles 
are fairly small and nearly the same in most speckle cameras. The entrance and 
exit faces are therefore nearly parallel. They are designed to allow light of a desired 


Detector 
Focal Collimating Risley Filter Reimaging Focal 
Plane Lens Prisms Lens Plane 


Fig. 1. <A typical optical layout for a speckle interferometer. A short focal length lens is used 
to collimate the beam expanding from the focal plane of the telescope. In the collimated beam, 
Risley prisms can be used to correct for atmospheric dispersion and a filter is used to achieve good 
speckle contrast. A longer focal length lens reimages the beam onto the detector focal plane. 
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wavelength to pass through without any angular deviation from the incoming beam. 
However, light of other wavelengths, either above or below this, will suffer a small 
angular deviation, thus dispersing white light slightly, about the central wavelength. 
Two Risley prisms that can be rotated about the optical axis are used in combi- 
nation. The dispersion created by one prism may be thought of as a vector in the 
plane orthogonal to the optical axis. The combination of two such prisms with 
independent dispersion vectors that may be rotated allows the user to control the 
resultant vector’s length and direction. Thus, if calibrated correctly, they can be 
positioned so that the vector direction for the dispersion that they induce together 
is exactly opposite to what the atmosphere has produced in the beam. The effect 
of atmospheric dispersion is then neutralized. 


2.2. Detector Requirements 


The most important detector requirement is that it must record images at a rate 
fast enough to keep up with the fluctuations in the atmosphere above the telescope. 
Typically, this is on the order of 30-100 Hz. However, given that short exposures are 
taken, it is also important to have a detector with high quantum efficiency. Today, 
most systems used at large telescopes employ electron-multiplying CCDs (EMC- 
CDs). However, since the technique has been in use since the 1970s, several different 
detectors have been used to image speckle patterns, including high-speed film? and 
various microchannel plate-based imagers, most notably intensified-CCDs.* High- 
speed bare CCDs were used by Beletic et al.° and Horch et al.® Finally, large-format 
CCD imagers have been used.” These do not read out quickly in general, but the 
optical system in these cases included a scanning mirror system that allowed speckle 
patterns to be laid down over the large area of the chip in a predetermined raster 
pattern, after which the entire frame was read out. 

EMCCDs have a number of properties that make them ideal detectors for 
speckle imaging applications. Instead of the serial register that normal CCDs have to 
clock charge toward the charge amplifier and read it out, EMCCDs have a sequence 
of gain registers. Charge is clocked from register to register with a voltage much 
larger than is used in normal CCDs, and as a result, there is some probability 
that secondary charge carriers can be dislodged, increasing the total amount of 
charge that is eventually read by the amplifier. By transferring the charge through 
hundreds of registers in this way, a significant increase can be obtained. If the 
amplifier has a low enough read noise, then even a single photon event is multiplied 
so that it can be measured above the read noise after amplification. Such systems 
have sub-electron read noise; a value less than one is obtained by taking the read 
noise value and dividing by the typical number of charge carriers that are produced 
by a single detected photon event. However, this gain does come at a price: sat- 
uration will be reached at very low count rates. The user can generally select the 
desired gain, so if looking at a bright star, a low gain (or no gain) is used, whereas 
when observing a faint star, a high gain would be selected. Commercially available 
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EMCCDs can have quantum efficiencies as high as 95% and formats as large as 
1024 x 1024 pixels. 


3. Examples of Current Systems 


3.1. The USNO Speckle Interferometer 


The US Naval Observatory (USNO) has had a presence in speckle interferometry 
for over 20 years. 1° Worley and Douglass began the program in the early 1990s, 
and it has been continued by Mason and Hartkopf until the present. The main use 
of the interferometer is in the study of visual binary stars, using as its primary host 
telescope the 66-cm refractor at the US Naval Observatory station in Washington, 
DC. Other telescopes, such as the 4-m telescope at Kitt Peak have also been used. 

The current speckle interferometer in use for the program has an optical path 
similar to that of Fig. 1, but Risley prisms are not generally used, owing to the fact 
that the size of speckles on the image plane will be on the order of 0.2 arcseconds, 
so that dispersion will be modest in comparison. (A second similar system that 
does contain Risley prisms is used at larger telescopes, where the dispersion is a 
larger fraction of the speckle size.) Filters available include the Strémgren y and the 
standard V. The ICCD, manufactured by Electro-Optical Services (EOS), consists 
of a microchannel plate bonded to a Sony video camera with a fiber-optic taper 
to decrease the size of the output of the image intensifier to fit on the chip inside 
the video camera. At the 66-cm telescope under good conditions, the system has 
reached some stars of 14th magnitude.!° 


3.2. The HRCam on the SOAR Adaptive Module 


The SOAR Telescope, located at Cerro Pachén in Chile, is a 4-m class telescope 
that has among its available instruments the SOAR Adaptive Module (SAM), a 
very flexible optical unit that can be used for both imaging and low-dispersion 
spectroscopy. It provides some low-order adaptive optics correction for the incoming 
science beam in the form of a low-altitude Rayleigh laser guide star. An instrument 
known as HRCam (High-Resolution Camera)!! can attach to this system. It has 
the standard optical arrangement for obtaining the magnification needed for speckle 
imaging, although the collimating lens has a negative focal length and intercepts the 
input beam from SAM before it has come to a focus. The detector used is an Andor 
Luca EMCCD with 10 pixels. The HRCam has produced thousands of measures 
of binary stars observable from the Southern Hemisphere in recent years!*:!8 and 
an image of this system is shown in Fig. 2‘a). 


3.3. Differential Speckle Survey Instrument 


The Differential Speckle Survey Instrument (DSSI) is a speckle interferometer that 
records speckle images in two colors simultaneously. The instrument was constructed 
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(a) (b) 


Fig. 2. Two examples of speckle interferometers in current operation. (a) The HRCam used with 
the SOAR Adaptive Module. Image courtesy of A. Tokovinin, CTIO. (b) The Differential Speckle 
Survey Instrument, pictured here at the WIYN 3.5-m Telescope at Kitt Peak, Arizona. 


in 2008, and a full description is given in Ref. 14. An image of the system is shown 
in Fig. 2(b), showing that at present the system is used with two Andor iXon EMC- 
CDs. These have 16: square pixels and a 512 x 512-pixel format. In the collimated 
beam between the input and output lenses, a dichroic beamsplitter is placed, so that 
light above a certain wavelength is transmitted while light below that wavelength is 
reflected. The system does not have Risley prisms; it was originally constructed for 
use at the WIYN Telescope at Kitt Peak National Observatory, which had Risley 
prisms integrated into the optics available at the instrument port where DSSI was 
to be used, so dispersion correction was not initially needed. However, the instru- 
ment has more recently been used at both Lowell Observatory’s Discovery Channel 
Telescope as well as the Gemini North Telescope. In these situations, observations of 
unresolved bright stars are used to characterize and remove the effects of dispersion. 
The fact that the system takes images in two colors simultaneously gives further 
leverage on removal of its effects independent of Risley prisms. The main projects 
that use DSSI at present involve vetting of exoplanet candidate host stars and faint 
companion detection of nearby stars. 


3.4. The SAO Speckle Interferometer 


The speckle interferometer at the Special Astrophysical Observatory in Russia is 
used with the 6-m telescope.!*:!® This system utilizes a Princeton Instruments 
EMCCD camera that can read out at a rate of 29 512 x 512-pixel frames per second. 
Magnification of the image is accomplished with one of two microscope objective 
lenses, 8x and 20x, which give pixels scales of 17.2 and 6.8 milliarcseconds (mas) 
per pixel, respectively. Objects as faint as 16th magnitude have been successfully 
observed. Filter choices include a 550-nm filter with 20nm FHWM, 600nm with 
40nm FWHM, and 800nm with 100nm FWHM. No atmospheric dispersion cor- 
rection is used. 
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3.5. Pupil Interferometry Speckle Coronograph 


The Pupil Interferometry Speckle COronograph (PISCO) was designed to be a 
versatile high-resolution imaging camera for the Pic du Midi Observatory in France. 
In addition to Risley prisms and filters, the system also has the ability to place 
aperture masks in the collimated beam as well. A grism can also give a low-dispersion 
spectroscopic capability. In recent years, it has been used mainly to measure binary 
stars from Merate, in Italy, at the 1-m Zeiss telescope at that location.!” 


4. Data Reduction Principles 


The raw data of a speckle observation consists of a sequence of short-exposure 
images of the target, magnified to sufficient scale so that individual speckles are 
resolved. In essence, the presence of the speckles indicates that diffraction-limited 
information is present in the images, since the size of speckles is determined by the 
diffraction-limited point spread function of the telescope. The speckles are spread 
out over a region on the image plane, but source features are still present on the 
same spatial scale as the speckles. It is therefore in essence a deconvolution problem 
to retrieve the high-resolution information desired. 

If one were to simply co-add all of the individual data frames, high-resolution 
information would be lost as the speckles change in brightness and location from 
moment to moment, so the basis of speckle data reduction is to form correlation 
functions from data frames that can be shown to retain information in the Fourier 
domain out to the diffraction limit. 


4.1. Autocorrelation 


For a short exposure image I(a,y) where (a, y) defines position on the image plane 
and I represents irradiance detected, the simplest such correlation function is the 
autocorrelation, defined by 


A(z, y) = pew Ta’ +a,y' + y)de'dy’. (1) 


Once this function is computed for each frame, the result is summed to increase 
signal-to-noise ratio. If the object being observed is a binary star with vector sep- 
aration given by (2, yo), then the autocorrelation will exhibit three peaks, one at 
the origin, that is, at (a, y) = (0,0), and two that are symmetrically situated about 
the origin, one at (xo, yo) and one at (—29, —yo). These positions correspond to the 
shift vectors where the primary star in one copy of J in the integrand above overlaps 
the secondary of the other copy of J and where the secondary of the first copy of I 
overlaps with the primary of the second copy. 

In a summed autocorrelation, there are also random correlations at other posi- 
tions on the image plane, but for a sufficiently long sequence of images, these result 
in a smoothly varying envelope on top of which the three correlation peaks from 
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the binary source sit. A simple method to retrieve a diffraction-limited autocorre- 
lation function of the source is simply to estimate and remove the background. For 
example, one may boxcar smooth the autocorrelation, which will remove the speckle 
peaks but leave the background envelope largely unaffected, and then subtract this 
from the original autocorrelation. In that case, the envelope subtracts away and 
leaves the diffraction-limited source autocorrelation. 

Another way the high-resolution information can be seen and extracted is to 
Fourier transform the autocorrelation. This results in the spatial frequency power 
spectrum: 


FT{A(2,y)} = [L(u,v)?, (2) 


where’ represents a Fourier transform, and the Fourier conjugate variables to x and y 
are given by u and v, respectively. If one observes a bright single star under similar 
conditions, S(x,y), and forms the summed power spectrum of that observation, 
|S (u,v)|?, then one may perform a deconvolution of the power spectrum obtained 
for the object of interest by dividing in the Fourier domain. By taking the square 
root of this, the modulus of the Fourier transform of the object itself is obtained: 


This function then represents diffraction-limited information of the object in the 
Fourier domain. However, given that in general O(u,v) will be a complex-valued 
function, one cannot simply inverse-Fourier transform the above to arrive at a 
diffraction-limited image of the source, since the phase of O has been lost. An 
illustration of the main data products discussed here is shown in Fig. 3 


(a) (b) (c) (d) 


Fig. 3. An illustration of basic data in speckle interferometry. (a) A short-exposure image of 
a close double star. Examples of double speckles can be identified within the seeing envelope. 
(b) An integrated image made from 200 individual speckle frames like the one in (a). Here the 
binary nature is not evident as the summing of frames washes out the speckle character. (c) The 
sum of the autocorrelations of the 200 frames. Peaks are seen at the positive and negative vector 
separations of the secondary star. (d) The power spectrum of (c), showing fringes in the case of a 
binary star. 
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4.2. Triple Correlation 


Information concerning the phase of O is retained in a higher order correlation 
function known as the triple correlation and defined as follows: 


C(x1,X2) = re - I(x’ + x1) - I(x’ + x2)dx’, (4) 


where to simplify notation, a two-dimentional vector on the image plane is written 
x = (x,y) and similarly on the Fourier plane, u = (u,v). 

The Fourier transform of the triple correlation is known as the bispectrum, and 
it can be shown that 


FT{C(x1,x2)} = C(u1, uz) = I(uy) - [(ug) - (uy + up), (5) 


where the Fourier conjugate variables to x; and x2 are u, and uy, respectively. The 
triple correlation is a scalar function of two two-dimensional vectors; in total, it is a 
function of a four-dimensional argument. It can also be shown that the phase of the 
bispectrum of J is equivalent to that of the object, O(x). Therefore, in considering 


the phase of the bispectrum arg(C), we may immediately conclude that 


arg(C(u1, u2) = d0(u1) + d0(u2) — do(u1 + U2)), (6) 


where ¢o represents the phase of the object. If ug is chosen as a small, constant 
value Au, while u; = u can range over the entire Fourier plane, we obtain 


$o(Au) — arg(C(u, Au)) = ¢o(u + Au) — go(u). (7) 
The above is in the form of a finite difference equation; if both sides were divided by 
Au and the limit as Au approaches zero is taken, the derivative of the object phase 
would be obtained. By integrating, the phase function itself may be calculated. In 
practice, with digital detectors, the Fourier plane is already sampled and the choice 
of Au will be an integer multiple of the sampling interval. The simplest process for 
obtaining the phase would be to set the phase at Au = 0 in the above and to then 
calculate ¢9(0) = arg(C(0, Au)). From this point, the phase at u = nAu, where n 
is an integer, can be calculated iteratively from the values in the bispectrum and 
the previous location, (n — 1)Au. More sophisticated methods of phase reconstruc- 
tion can also be devised: for example, Ref. 18 developed a relaxation technique to 
minimize systematic error obtained in high frequencies from the iterative approach 
starting at the origin. 


4.3. Image Reconstruction 


A diffraction-limited estimate of the object’s Fourier transform can be assembled 
from the results of the power spectrum and bispectral analysis. To obtain a recon- 
structed image, Fourier inversion is required. However, merely inverse-transforming 
the result of O generally does not result in an image of high quality because the 
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(a) (b) (c) (d) 


Fig. 4. An illustration of image reconstruction in speckle interferometry. (a) The recovered mod- 
ulus of a binary star similar to that in Fig. 3 (b) The phase recovered from the image bispectrum. 
(c) The same as in (a), but low-pass filtered with a Gaussian function to suppress high-frequency 
noise. (d) The resulting image reconstruction, that is, the inverse-Fourier transform of (c), where 
only the central region is shown. 


higher spatial frequencies are dominated by noise, and this leads to significant struc- 
ture on the image plane that is not related to the true object irradiance distribution. 
Some form of low-pass filtering is required. While many choices exist, a simple way 
to filter is with a Gaussian shape in two dimensions. This suppresses high-frequency 
noise while producing a relatively smooth, high-resolution image. An example of the 
typical image reconstruction process is shown in Fig. 4. 


5. Science with Speckle Interferometers 


5.1. Binary Stars 


The utility of speckle interferometers for the study of binary stars has been known 
since the origins of the technique in the 1970s. Prior to that point, the main ways 
to identify a pair of gravitationally bound stars were to either study the system 
spectroscopically or through photographic imaging or visual methods. However, 
spectroscopic binaries are most easily observed when the motion of spectral lines 
due to the Doppler effect is large. This will be the case when the stars have a 
relatively small separation. On the other hand, seeing-limited imaging techniques 
resolve the two stars only when the separation between them is above a few tenths 
of an arcsecond; for the vast majority of pairs, this would correspond to orbital 
speeds too low to be measured spectroscopically at that time. Speckle interferome- 
ters promised to help to bridge the gap between these two methods, allowing more 
stars to be successfully studied with both types of techniques. The advantage of 
this is that when there is both velocity and positional information available for 
both components, it is possible to calculate both individual masses for the two 
stars as well as a distance to the system. 

Since the late 1970s and early 1980s, significant effort has been expended to 
obtain extremely precise relative positions of binary stars through speckle interfer- 
ometry, most notably through the efforts of the CHARA group at Georgia State 
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(a) (b) 


Fig. 5. Two examples of binary star orbits calculated using data from speckle interferometers. 
(a) WDS 04199 + 1631 = STT 79. (b) WDS 09179 + 2834 = STF 3121. Blue points indicate data 
taken with speckle interferometers while green points are from other methods, typically visual 
micrometry. Images courtesy of William Hartkopf and Brian Mason, USNO. 


University and the US Naval Observatory, the Yale-Southern Connecticut collab- 
oration, the the SOAR speckle program, the SAO speckle effort, and others. The 
result of these observing programs is a more than 30 year span of high-quality 
astrometry on a large number of binary stars. From this, hundreds of visual orbits 
have been computed, and two examples are shown in Fig. 5. The determination of 
orbital elements, including the orbital period and the semi-major axis of the orbital 
ellipse, together with a parallax measurement for the system lead directly to a mass 
sum of the two stars. If the system is also a spectroscopic binary, individual masses 
can be obtained. 

Empirically determined mass information is an important tool in understanding 
stellar structure and evolution theory, if it can be combined with other properties of 
the individual stars, such as surface temperature, luminosity, and metal abundance. 
A good review of the state of these kinds of studies is found in Ref. 19. 


5.2. Faint Companion Detection 


In recent years, there has been an increased interest in the use of speckle interfer- 
ometers for detecting close companions to stars for several reasons. Most notably, 
the discovery of hundreds of transit events by the Kepler satellite has led to a 
catalog of exoplanet candidate host stars. To confirm the nature of these systems, 
ground-based follow-up observations are needed. High-resolution imaging provides 
additional information because the Kepler resolution is set by the pixel size of the 
focal plane detectors, which maps to about 4x4 arcseconds on the sky. Ground-based 
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seeing-limited images would improve that resolution by a factor of 4-6; diffraction- 
limited imaging at a large telescope improves this by a factor of 20 more in the 
visible range. Work of this kind has shown that it is not uncommon for exoplanet 
host stars to also have stellar companions, although there may be a depletion of 
small-separation binaries when exoplanets are present.?”?! When an exoplanet sys- 
tem does have a stellar companion, it also affects the derived radius of the planet(s). 

There are other types of science that also benefit from companion detection. 
For example, high-resolution imaging of members of open clusters can discover 
close companions of those stars and give a more complete picture of the binary and 
multiple star populations in the cluster environment. High-resolution surveys of the 
nearest stars can reveal more about the binary fraction as a function of spectral 
type. 

To detect companions, most studies have established a detection limit based 
on statistics of the high-resolution images obtained. Typically, this involves looking 
at the variation in the image as a function of separation and converting the 50 
level above the image noise into a magnitude difference. If a peak has a magnitude 
difference below this level, then it is regarded as a detection of a companion, and 
if the image has no peak that satisfies that criterion, then it is considered a non- 
detection. Comparing typical results to date on Kepler and K2 sources, speckle 
interferometers can provide deeper detection limits over most adaptive optics sys- 
tems in a range from close to the diffraction limit to approximately 0.3 arcseconds 
from a central star. An example of a faint detection is shown in Fig. 6. Because 
most faint stellar companions are red, the development of low-noise infrared arrays 
could be an important way to detect such stars in coming years. 
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Fig. 6. (a) The discovery image of a faint companion to the metal-poor spectroscopic binary star 
WDS 04163 + 3644 = YSC 128. (b) A detection limit curve for the image shown, together with all 
local maxima and minima in the image. The square below the 50 curve drawn in green represents 
the companion. 
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5.3. Extended Objects 


Because of the higher signal-to-noise ratios enjoyed by speckle interferometers that 
employ EMCCDs, there has been recent interest in imaging of sources with more 
complicated structure than binary stars, as well as pushing the limits of the image 
reconstruction algorithms for this purpose. Two examples are in the area of plane- 
tary astronomy and in the imaging of geostationary satellites. 

In preparation for the recent Pluto Express mission, Howell and his collabo- 
rators?? used the Differential Speckle Survey Instrument discussed earlier with the 
Gemini-North 8.1-m Telescope at Mauna Kea to produce diffraction-limited images 
of the Pluto-Charon system. An example of one of their images is shown in Fig. 7. 
A number of other solar system objects, all main-belt asteroids, have also been 
imaged at Gemini; while these data are not yet in the literature, the speckle data 
clearly show these targets as resolved and would serve for precise measurements of 
projected major and minor axes of these bodies. The SAO speckle group in Russia 
has also recently determined the orbit of the binary asteroid 22 Kalliope with speckle 
data.?3 

Another area where speckle image reconstructions may have some interest is in 
the imaging of geostationary satellites. This has been pursued for example at Lowell 
Observatory by van Belle and his collaborators. In these cases, the eventual goal is 
the combination of speckle data with simultaneously obtained data from the Navy 
Precision Optical Interferometer. In this case, speckle data would provide relatively 
low-resolution information, or low spatial frequency information, whereas the Navy 
interferometer would provide higher spatial frequencies. While still speculative, the 


Fig. 7. A speckle image reconstruction of the Pluto-Charon system. The diameter of Pluto at the 
time of observation was approximately 0.2 arcseconds, and the projected separation between the 
dwarf planet and its moon is approximately 1 arcsecond. 
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combination of data from both regimes could yield extremely high-resolution visible- 
light images of satellites in Earth orbit. 


6. Conclusions 


Speckle interferometers are specialized camera systems used for high-resolution 
imaging at large single-aperture telescopes. Their ability to recover diffraction- 
limited images relies on the fact that they read out at video rates, that they magnify 
the seeing disk of the object being observed to an appropriate scale to that the 
speckles are at least approximately critically sampled, and that they use narrow- 
band pass filters. The first and second properties allow the system to keep up with 
the effect of atmospheric fluctuations on the stellar image that results in speckle 
motion on the image plane, and the third property provides quasi-monochromatic 
speckle patterns, and therefore good speckle contrast on the image plane. Recently 
built speckle interferometers generally have taken advantage of the development of 
EMCCD cameras. These devices provide photon-counting performance with sub- 
electron read noise at above 90% quantum efficiency — a major improvement in 
performance over previous type of devices that have been used. This has powered a 
resurgence in the use of speckle interferometers for a range of science applications. 

The most significant of these is the role that speckle interferometers can play 
and are playing in vetting exoplanet candidates. Speckle interferometers can help 
determine if there is a close stellar companion in such cases. The method has also 
shown that stellar companions to stars that host exoplanets are not uncommon. 
Statistically, the rate of stellar companions appears to be similar to the field popu- 
lation of binary stars, although the details of the separation and period distributions 
may be different. In addition, speckle interferometry continues its traditional role in 
terms of providing precise visual orbital elements of binary stars. From this, mass 
information of main-sequence stars is obtained. 
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Amplitude interferometry is a technique for achieving high angular resolution in 
astronomy. Applications include the measuring the angular diameters of stars 
and the determination of stellar masses by combining interferometric and spec- 
troscopic observations of binary stars. The underlying theory is briefly reviewed 
before discussing the basic features of an amplitude interferometer. The discus- 
sion is limited to interferometers consisting of two or more fixed input stations 
connected to a central laboratory where the light from the stations is processed 
(some alternative techniques for achieving high angular resolution are covered 
in other chapters of this Handbook). Specific aspects that are discussed include: 
the input optics, beam transport and path equalization, dispersion compensation, 
and fringe detection. The effects of atmospheric “seeing” and the sensitivity of 
amplitude interferometers are also discussed briefly. 


1. Introduction 


The theoretical angular resolution of a ground-based telescope is ~ A/D, where 
X is the wavelength and D is the diameter of the telescope aperture. In practice, 
atmospheric seeing degrades this; depending on the site, the actual resolution of 
telescopes is of the order of 0.2—1.0 arcsec. Optical/infrared interferometry is a tech- 
nique that can achieve much higher angular resolution. A very incomplete list of 
applications of interferometry includes the measurement of the angular diameter 
of stars,! the oblateness of rapidly rotating stars,? investigations of circumstellar 
disks,*:+ the determination of stellar masses by combining interferometric and spec- 
troscopic observations of binary stars? and the estimation of distances to Cepheid 
variable stars by the interferometric Baade-Wesselink method.°® 

At least one ground-based interferometer (the Navy Precision Optical Interfer- 
ometer [NPOI]) has also been optimized for precision astrometry,’ and additional 
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instruments (e.g. GRAVITY for the Very Large Telescope Interferometer [VLTT]) 
are being developed.* 

In its simplest form an amplitude interferometer is analogous to a classical 
Young’s slit interferometer. In astronomy, we distinguish “amplitude” interferom- 
etry from “intensity” interferometry.” a technique based on the Hanbury Brown- 
Twiss effect.2-!° In this chapter, “interferometry” will always mean “amplitude 
interferometry.” 

The angular resolution of a simple two-aperture interferometer is ~ /|b|, where 
A is the wavelength and b is the projected baseline, a vector defined as the separation 
between the two apertures viewed from the source. If the apertures are small the 
effects of atmospheric seeing are reduced and Ref. 11 originally proposed using 
interferometry as a way of overcoming the loss of resolution due to seeing. Masked 
aperture interferometry is a modern application of Fizeau’s concept. 

The resolution of a masked aperture interferometer is limited by the diameter of 
the host telescope; Michelson was able to increase the resolution with his “Michelson 
stellar interferometer,” which had a baseline of ~ 6m, and he used this to measure 
the angular diameter of a Ori.! 

In this chapter, we focus primarily on interferometers consisting of two or more 
fixed input stations connected to a central laboratory where the light from the 
stations is combined. Examples include the Sydney University Stellar Interferometer 
(SUSI), located near Narrabri in New South Wales, Australia,!?13 the CHARA 
Array at Mt. Wilson in California, USA, the VLTI at Paranal, Chile,!° and NPOI 
at Anderson Mesa, Arizona, USA. 

An interferometer detects the interference fringes produced by the source being 
observed. It follows from the van Cittert—Zernike theorem!® that the complex degree 
of coherence y associated with the fringe pattern is the normalized Fourier trans- 
form of the intensity distribution over the source. The experimentally determined 
complex degree of coherence is usually called the fringe visibility to distinguish it 
from the idealized complex degree of coherence. The visibility is a function of the 
projected baseline b, and each measurement made with different “nonredundant” 
baseline is a different sample of the complex Fourier transform. With a sufficient 
number of samples it is possible — at least in theory — to invert the transform and 
recover the intensity distribution. 

Atmospheric seeing adversely affects optical and infrared interferometers in two 
main ways. The first effect is to reduce the fringe visibility. This visibility loss is a 
function of the aperture diameter and can be reduced by the use of adaptive optics 
(for a simple discussion of this, see Ref. 17). As discussed in Sec. 6, the effects of 
seeing on visibility measurement can be very significantly reduced by the use of 
single-mode fiber beam-combining optics. The second effect of seeing is that the 


*See Chapter 7 of this Volume. 
>See Chapter 3 of this Volume. 
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phase of the complex coherence is usually lost. If three or more apertures are used, 
it is possible to recover some phase information; see Sec. 2.1. 

The analysis of interferometric data is also complicated by the fact that two- 
aperture interferometers only sample a relatively few points in Fourier space. 


2. Basic Theory 


The schematic design of a “classic” amplitude interferometer is shown in Fig. 1. We 
first assume that the detected light is “quasimonochromatic;” i.e. the bandwidth is 
assumed to be arbitrarily small. The light from the two input apertures is super- 
imposed at a beamsplitter and the two emerging beams fall on two detectors. The 
signals Sj. from the two detectors will be proportional to 


Si2 = Sol + R{yexp[27icz]}) = So(1 + |7| cos[a + 2702]), (1) 


Al 


OPC 


D1 D2 

Fig. 1. Schematic diagram for a “classic” amplitude interferometer; i.e. one that uses bulk optics 
and a beamsplitter to combine the light from two or more separate telescopes. T1 and T2 are the 
input telescopes. In general, there will be an external optical path difference (OPD) Ax between 
the two telescopes. Some light is directed to the adaptive optics systems denoted by Al and A2. 
An internal optical path compensator (OPC) is used to compensate for the external OPD. The 
light from the two telescopes is combined at the beamsplitter BS and the interfering beams are 
detected at D1 and D2. See Sec. 5 for more information. 
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where z is the optical path difference (OPD) between the two arms of the inter- 
ferometer, ¢ = \—! is the spectroscopic wavenumber, is the wavelength and the 
complex degree of coherence y = |y|expia is given by the van Cittert—Zernike 
theorem: 


y(o,b) = i / d*x exp{—2riob - x}I(x), (2) 


where Ip is the total irradiance from the source, I(x) is the distribution of irradiance 
over the source and b is the projected baseline. 

Practical interferometers will have a nonzero bandwidth described by a trans- 
mission function T(c) (if the interferometer has multiple wavelength detectors, this 
will be the transmission function for a single resolution element). In this case, Eq. (1) 
becomes 


S12 = Sofl £ R{T(z)y(B) exp[2risz]}], (3) 


where T(z) is the Fourier transform of T(c) and @ is the mean wavenumber. 
Ideally, the measured fringe visibility V = |V| exp{i¢} is equal to y; in practice, 
however, atmospheric seeing and instrumental effects may reduce the observed vis- 
ibility and |V| = 7|y|, where |n| < 1 is the loss factor. Figure 2 shows a single scan 
in OPD of fringes obtained with SUSI for the star 6 CMa using a baseline of 5m 
and a wavelength of 700 nm. The noise in the signal is primarily due to atmospheric 
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Fig. 2. An example of real fringes obtained with SUSI for the star 6 CMa using a baseline of 5m 
and wavelength of 700 nm. 
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seeing. The visibility loss can be estimated by observing sources that are known 
to be unresolved by the interferometer and, since the loss is seeing-dependent, it is 
usual to interleave observations of the science objects with unresolved calibrators. 
As we shall see in Sec. 6 some fringe detection systems are relatively immune to 
these losses. 

Having calibrated the fringe visibilities, the source irradiance distribution can 
be estimated (an example of the expected fringe visibility for a main sequence star 
is shown in Fig. 3). Image reconstruction is beyond the scope of this chapter, but it 
should be noted that there are two factors that complicate the process of recovering 
the image: 


(1) Interferometers with fixed stations can only measure relatively few points in 
Fourier space (in radio interferometry this is known as the “(u-v)-plane”) and 
one must rely heavily on models for the source irradiance distribution. For 
example, main sequence stars can be approximated as uniformly illuminated 
disks. 

(2) The phase of the fringe visibility is corrupted by atmospheric and instrumental 
effects and cannot be used. However, when three or more apertures are used 
one can determine closure phases instead (see Sec. 2.1). The fringe phase for 
objects with a center of symmetry is zero and closure phases are particularly 
important when observing asymmetric sources such as rapidly rotating stars. 


2.1. Phase Closure 


References 18 and 19 noted that the phase noise at a given receiver is additive. 
Suppose we have a three-element interferometer. If the subscripts 1, 2,3 are used to 
label the three apertures, the measured phases will be 


Piz = 12 + 0, — Oo, (4) 
23 = A123 + 92 — 83, (5) 
$31 = 31 + 03 — 01, (6) 


where aj; is the phase of the complex degree of coherence for the baseline b;; and 
@; is the phase noise associated with each aperture. The quantity 


C123 = b12 + $23 + $31 = Q12 + 23 + 31 (7) 


is the closure phase and is independent of the phase noise at the three apertures. For 
an interferometer with N elements (NV > 3), the number of good closure phases is 


(N —1)(N — 2)/2 


Phase closure was first proposed for radio interferometers, and Ref. 20 was 
the first to suggest that it could also be used at optical wavelengths. The first 
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demonstration of phase closure for an optical interferometer was by Ref. 21. Most 
modern interferometers have the capability of measuring closure phases. 


3. Angular Resolution 


The angular resolution of an interferometer is determined by the available baselines 
and the operating wavelength. For the instruments considered in this chapter, the 
resolution is set by the projected baseline; depending on the location of the source 
on the sky, this can be significantly smaller than the physical baseline. 


4. Field of View 


The fringes formed in most optical and infrared interferometers are located in the 
pupil plane. The extent of the fringes, and hence the “interferometric” field of 
view, is set by the bandwidth, and is given approximately by \Z/(AAb), where 
A X is the bandwidth, Ap the mean wavelength and b the projected baseline. Most 
targets (single stars, binary systems, circumstellar disks and shells, etc.) will be 
much smaller than this and limitations due to the field of view will normally not be 
an issue. 

Suppose the fringes are formed in the image plane.° The fringes will be localized 
at the phase center (i.e. the point in the image plane where the OPD is zero) and 
objects lying away from the phase center will not exhibit interference. This is in 
contrast to aperture masking interferometry, where the interferometric field of view 
is determined by the aberrations of the telescope. 

Reference 22 showed that an interferometer that satisfies a “golden rule” can 
have a much larger field of view: “as viewed from a point in the focal plane, beams 
from separated telescopes must be recombined so that they appear to be coming 
directly from a single large telescope which has been masked so as to reproduce 
exactly the ensemble of collecting telescopes.” Another term for the golden rule is 
“homothetic mapping”; see, for example, Ref. 23. 


5. Basic Features of an Amplitude Interferometer 


Many of the basic features were reviewed by Ref. 17. This section describes the 
elements which are common to most large interferometers. A wide variety of fringe 
detectionn techniques are used, and these are described separately. 

For conciseness we do not discuss the many auxiliary optical systems typically 
found in modern interferometers. These are essential and are used for optical align- 
ment, calibration, etc. 


“In astronomy pupil-plane and image-plane interferometers are sometimes called Michelson and 
Fizeau interferometers, respectively. This can cause confusion, since in optical science Michelson 
and Fizeau interferometers refer to specific systems widely used for optical testing, etc. 
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5.1. Input Optics 


The layout of the input stations is chosen to give a range of baselines. If the interfer- 
ometer is designed to operate simultaneously with three or more inputs the stations 
have to be located so the baselines are “nonredundant”; i.e. each baseline corre- 
sponds to a unique spatial component in Fourier space. The final choice will be 
influenced by cost factors and local topography. 

The input optics, generally housed in appropriately sized domes, are usually 
telescopes, although at least two interferometers (NPOI and SUSI) use siderostats. 
Transfer optics direct the light from the primary mirror to the input of the beam 
transport system. All interferometers will use some form of adaptive optics’ to 
stabilize the beam and possibly provide additional wavefront correction. 

The aperture diameters range from 15cm (SUSI) to the 8m-unit telescopes of 
the VLTI. 


5.2. Beam Transport and Path Equalization 


For simplicity throughout this section, we consider a two-aperture interferometer. 
If three or more inputs are used the optics become somewhat more complicated but 
the basic principles are the same as for a two-aperture instrument. 


5.2.1. Beam Transport 


The light from each station must be brought to a central laboratory without degrad- 
ing the optical quality of the light beams. Most interferometers use evacuated pipes 
for this purpose, in order to eliminate any local seeing effects. The VLTI uses air- 
conditioned underground tunnels for this purpose. 

The diameter of the light beams must be chosen to minimize diffraction effects, 
which cause signal loss due to light diffracted out of the main beam and also intro- 
duce wavefront distortion due to near-field and Fresnel diffraction.?4 ?° 


5.2.2. Path Equalization 


Assuming that the interferometer lies in a horizontal plane it is easy to show that 
there is an external optical path difference (OPD) between the two arms of the 
interferometer equal to A = b-S, where b is the baseline vector b and § is the unit 
vector pointing at the source. For a North-South oriented baseline, the OPD is zero 
when the source transits over the baseline; for an East-West baseline, the rate of 
change A is zero at transit. 

The Large Binocular Telescope Interferometer?” is an example of an instrument 
where the two input telescopes are mounted on a common azimuth platform, which 
rotates to keep A = 0. Most other interferometers utilize input optics fixed to the 
ground and in general A ¥ 0 and to observe interference the optical paths in the 


See Volume 2, Part 9 “Adaptive Optics.” 
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interferometer must be matched to within the coherence length of the interferome- 
ter. The external OPD must therefore be matched by a compensating OPD in the 
instrument, and this is typically done at the central beam-combining laboratory. 
Since the external OPD can range from zero to several hundred meters the path 
compensator is usually the largest optical system in the instrument. Compensator 
design varies among interferometers but typically there will be one or more mov- 
ing carriages, controlled by laser interferometers, that carry retroreflecting optics. 
Because the compensator includes moving optical systems it is a potential source 
of phase noise due to vibration from motors, irregularities in the tracks used by the 
carriages, etc. A common solution is to use two or more stages of servo control, with 
a fine control system used to reduce jitter in optical path to an acceptable level. 

One design of retroreflector is a parabolic mirror with a small flat located at 
the focus of the paraboloid. This has the advantage that the flat can be mounted on 
an actuator to provide fine control of the OPD. Additional “pop-up” mirrors can 
be used to introduce fixed amounts of optical path. 

The external OPD can be computed to high precision (see, for example, Ref. 28). 
The optical paths in the OPD compensator can be monitored with a metrology laser 
system, but high precision is also associated with high cost. The fringe detector (see 
Sec. 6) can be integrated with the compensator control system to implement a fringe 
search-and-track algorithm. 


5.2.3. Dispersion Compensation 


If the interferometer is horizontal, the air paths above each station are equal (apart 
from small fluctuations due to turbulence) and it follows that the external OPD is 
in vacuo. Consequently, the OPD compensator in the interferometer should also be 
in an evacuated system. This introduces considerable expense and complication in 
the design of the OPD compensator and an alternative is to put the compensator 
in a stable air-conditioned environment. 

However, if the compensator is in air it results in a differential air path between 
the two arms of the interferometer and this introduces longitudinal dispersion. This 
dispersion will drastically reduce the fringe visibility for OPDs more than a few 
meters. A dispersion compensator consists of a variable glass path that can be 
introduced to match the dispersion of the air in the OPD compensator. The choice 
of glass (or glasses) will depend on the bandwidth and the maximum working OPD 
difference.?9 3° 


6. Fringe Detection 


The fringe detector is undoubtedly the one part of a modern interferometer that 
has seen the most variety and change, largely due to the rapid developments in 
optical technology in the late 20th and early 21st centuries. Reference 31 observed 
fringes visually and estimated the angular diameter of the source from the baseline 
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V2 


Fig. 3. The squared visibility |V|? for a uniform-disk (solid line) and for a limb-darkened (broken 
line) star. The quantity x = 7b6@/\, where @ is either the uniform-disk angular diameter or the 
equivalent for the limb-darkened star. 


at which the fringes disappeared (the first null in the visibility function; see Fig. 3). 
This in turn allowed the estimation of the “uniform-disk” angular diameter of the 
star being observed.° 

An important constraint for fringe detection is atmospheric turbulence. Apart 
from introducing wavefront distortion it causes the phase of the fringe pattern to 
fluctuate on the timescale ty, where to is typically of the order of a few milliseconds, *? 
and one needs to sample the “fringe pattern” more rapidly than this. The earliest 
interferometers using photoelectric detection used photomultiplier tubes because of 
their fast response time (see, for example, Refs. 12 and 33). The Classic/CLIMB 
beam combiner at the CHARA Array is a modern example.*4 

The performance of beam combiners that use bulk optics (beam splitters, etc.) 
to combine the light from two or more telescopes is affected by atmospheric seeing, 
even when adaptive optics are used. A major development was the introduction 
of beam combiners using single-mode optical fibers (e.g. FLUOR®*°). Only a sin- 
gle mode is transmitted; other modes, corresponding to aberrations introduced by 
atmospheric seeing, are suppressed. The irradiance in each channel will still fluc- 
tuate due to scintillation and coupling into the fibers, but this is monitored by 
photometric taps. An upgraded version of the original FLUOR beam combiner has 
been described by Ref. 36. 


°It is interesting to note that Ref. 31 discussed the effects of limb-darkening, and how to correct the 
measured uniform-disk angular diameter to determine the true or limb-darkened angular diameter. 
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Another instrument incorporating single-mode fibers is PIONIER, operating at 
the VLTI.°” This is an infrared (H band) instrument that combines light from the 
four 1.8m Auxiliary Telescopes. AMBER is another instrument at the VLTI which 
determines visibilities and closure phases in the H- and K-bands and uses three 
input beams.°° 

Fringe detectors utilizing electron-multiplying charge-coupled devices 
(EMCCDs) have also been developed. These combine the high-speed photon count- 
ing properties of photomultiplier tubes or avalanche diodes with the imaging capa- 
bilities of CCD arrays. The PAVO instruments, developed for SUSI and the CHARA 
Array, are an example.*° Fringes are spatially dispersed along one axis and spectrally 
dispersed along the other. Although the fringes are distorted because of seeing the 
instrument is designed to allow full recovery of the visibility information. Another 
instrument utilizing photon-counting cameras is VEGA, also in use at the CHARA 
Array.*° 


6.1. Sensitivity 


The sensitivity or limiting magnitude of an interferometer depends on a number 
of factors including the aperture diameter, the transmission characteristics of the 
optics (a typical interferometer has a large number of reflecting surfaces), the detec- 
tors used and the atmospheric seeing conditions. As noted in Sec. 6, the sample time 
must be shorter than to, and this is an important limiting factor. 

A caveat is that the quoted limiting magnitude of a particular interferometer 
or detector system is typically that for an unresolved source, i.e. one for which 
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Fig. 4. The region 3 < x < 7 in Fig. 3 is shown here in more detail. 
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|y| = 1. Often, however, the most interesting data are obtained using baselines for 
which |y| < 1. In Fig. 3, we have plotted |V(zx)|? for two different models. Here 
x = 7b0/X, where @ is the angular diameter of the star. The solid line is the visibility 
function for a uniformly illuminated disk and the dashed line is for a star that has 
a solar center-to-limb intensity distribution at 550nm (taken from Ref. 41). The 
differences between the two models are only apparent in the “second lobe” of the 
visibility function. Figure 4 shows this region in more detail. The signal, |V|?, is 
always less than 0.02 in this region. For this reason, the signal-to-noise ratio (SNR) 
is often a better figure of merit than the limiting magnitude. 


7. Further Reading 


A useful general reference is the monograph by Ref. 42. The course notes from the 
1999 Michelson Summer School*? provide detailed discussion of many aspects of 
stellar interferometry. “Selected Papers on Long Baseline Interferometry” “+ 
reprints of many of the key papers in the field, covering the period 1868-1996. 


contains 
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Intensity interferometry exploits the second-order coherence of light — how 
intrinsic intensity fluctuations correlate between simultaneous measurements in 
separated telescopes. The optical telescopes connect only electronically (rather 
than optically), and the noise budget relates to electronic timescales of nanosec- 
onds; light-travel distances of centimeters or meters. This makes measurements 
practically insensitive to either atmospheric turbulence or to telescopic optical 
imperfections, allowing very long baselines, as well as observing at short optical 
wavelengths. Kilometer-scale optical arrays of air Cherenkov telescopes will enable 
optical aperture synthesis with image resolutions in the tens of microarcseconds. 


1. Highest Resolution in Optical Astronomy 


Tantalizing results from current amplitude/phase interferometers begin to show 
stars as widely diverse objects, and a great leap forward will be enabled by improv- 
ing angular resolution by just another order of magnitude. Bright stars with typical 
diameters of a few milliarcseconds require optical interferometry over hundreds of 
meters or a kilometer to enable surface imaging. However, phase interferometers 
(or amplitude interferometers*) need stability to within a fraction of an optical 
wavelength, while atmospheric turbulence and dispersion make their operation chal- 
lenging for very long baselines. 


2. Intensity Interferometry 


Intensity interferometry exploits second-order optical coherence: that of inten- 
sity, not of amplitude nor phase. It measures how random (quantum) intensity 


*See Chapter 2 of this Volume. 
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fluctuations correlate in time between simultaneous measurements in two or more 
separated telescopes. The method was pioneered by Robert Hanbury Brown and 
Richard Q. Twiss many years ago, for the original purpose of measuring stellar 
sizes.! The name “intensity interferometer” is actually somewhat misleading since 
nothing is interfering; the name originated from its analogy to amplitude interferom- 
eters, which at that time had similar scientific aims. Seen in a quantum context, it is 
a two-photon process, and is today seen as the first quantum-optical experiment. It 
laid the foundation for experiments of photon correlations and for the development 
of the quantum theory of optical coherence. 

The great observational advantage (compared to amplitude interferometry) 
is that it is practically insensitive to either atmospheric turbulence or to tele- 
scopic optical imperfections, enabling very long baselines as well as observing at 
short optical wavelengths, even through large airmasses. Telescopes connect only 
electronically (rather than optically), and the noise budget relates to electronic 
timescales of nanoseconds (light-travel distances of tens of centimeters or meters) 
rather than those of the light wave itself. A realistic time resolution of perhaps 
3 ns corresponds to 1 m light-travel distance, and the control of atmospheric path- 
lengths and telescopic imperfections then only needs to correspond to some fraction 
of that. 

Details of the original intensity interferometer and its observing program have 
been well documented.!° The principles are also explained in various monographs 
and reference publications.* !* Following these early efforts, the method has not 
been used in astronomy since (but has found wide applications in particle physics 
since the same correlation properties apply to not only photons but to all bosons, 
i.e. particles with integer values of their quantum spin). Currently, there are consid- 
erable efforts to revive intensity interferometry in astronomy, applying high-speed 
electronics in arrays of large air Cherenkov telescopes. The following descriptions 
are based upon such ongoing studies, simulations and experiments. 


3. Principles of Operation 


The basic concept of an intensity interferometer is sketched in Fig. 1. In its simplest 
form, it consists of two telescopes, each with a photon detector feeding one channel 
of a signal processor for temporally cross-correlating the measured intensities from 
two telescopes with the highest practical time resolution, probably around 1-10ns. 
With telescopes sufficiently close to one another, the intensity fluctuations measured 
in both telescopes are more or less simultaneous, and thus correlated in time, but 
when moving them apart, the fluctuations gradually become decorrelated. How 
rapidly this occurs for increasing telescope separations gives a measure of the spatial 
coherence, and thus the spatial properties of the source. 
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Fig. 1. Components of an intensity interferometer array. Spatially separated telescopes observe 
the same source, and the measured time-variable intensities, In(t), are electronically cross- 


correlated between different telescopes. Two-telescope correlations are measured as (J1(t)I2(t)), 
(I2(t)I3(t)), etc.; three-telescope quantities as (1; (t)J2(t)I3()), (L2(t)I3(t)L4(t)), etc. With no 
optical connection between the telescopes, the operation resembles that of radio interferometers. 


3.1. Two-telescope Observations 


The measurement provided by a two-telescope system is 


(h(t)ha(t)) = (a) a()) 0 + Innal?), (1) 


where 712 is the mutual coherence function of light between locations 1 and 2, the 
quantity commonly measured in amplitude/phase interferometers; ( ) denotes aver- 
aging over time. Compared to independently fluctuating intensities, the correlation 
between the intensities J; and Iz is “enhanced” by the coherence parameter. This 
relation is valid for a classical concept of light as a wave, but fundamentally this is 
a quantum-optical two-photon effect, which presupposes that the light is in a state 
of thermodynamic equilibrium, obeying the Bose-Einstein statistics. Such ordinary 
“chaotic” (also called “thermal”, “maximum-entropy” or “Gaussian” ) light under- 
goes random phase jumps on timescales of its coherence time, but Eq. (1) does not 
necessarily hold for light with different photon statistics (e.g. an ideal laser emits 
light that is both first- and second-order coherent, without any phase jumps, and 
thus would not generate any sensible signal in an intensity interferometer). 

Since the measured quantity is the square of the ordinary first-order visibil- 
ity, it always remains positive, only diminishing in magnitude when smeared over 
time intervals longer than the optical coherence time of starlight. However, for 
realistic time resolutions (much longer than an optical coherence time in broad- 
band light of perhaps 10~!*s), any measured signal is tiny, requiring very good 
photon statistics for its reliable determination. Large photon fluxes (thus large 
telescopes) are therefore required. For a given electronic time resolution, this dilu- 
tion is smaller for lower-frequency electromagnetic radiation (with longer coherence 
time), and for long-wavelength infrared and radio, this additional variability due 
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to fluctuations in the signal itself is more easily measured, and there known as 
“wave noise”. 


3.2. Three or More Telescopes 


Intensity interferometry lends itself to many-telescope operations. Since the signal 
is electronic only, it may be freely copied, transmitted, combined or saved, quite 
analogous to radio interferometry. In any larger array, the possible number of base- 
lines between telescope pairs grows rapidly, without any additional logistic effort. 
With N telescopes, N(N — 1)/2 baselines can be formed between different pairs of 
telescopes. For triplets of telescopes, one may construct 


(h(t) 12 (t)I3(t)) 
= (I1(t)) (Io(t)) (Is (t)) (1 + |ysal? + lyo3l? + lai? + 2Re[yi2723731])- (2) 


The phase of the triple product in the last term is the “closure phase” widely used 
in amplitude interferometry to eliminate effects of differential atmospheric phase 
errors between telescopes, since the baselines 1—2, 2-38, and 3-1 form a closed loop. 
Of course, intensity interferometry is not sensitive to phase errors, but three-point 
intensity correlations permit calculation of the real (cosine) part of this closure- 
phase function, which provides additional constraints that may enhance image 


reconstruction. !* 17 


4. Optical Aperture Synthesis 


To enable true two-dimensional imaging, a multi-telescope grid is required to provide 
numerous baselines that cover the interferometric Fourier-transform (u,v)-plane, 
analogous to radio arrays. However, compared to amplitude interferometers, a cer- 
tain complication lies in the fact that the correlation function for the electric field, 
12, is not directly measured, but only the square of its modulus, |712|?. Since this 
does not preserve phase information, the direct and immediate inversion of the 
measured coherence patterns into images is not possible. 

Various techniques exist to recover the Fourier phases. Already intuitively, it is 
clear that the information contained in the coherence map (equivalent to the source’s 
diffraction pattern) must place stringent constraints on the source image. Viewing 
the familiar Airy diffraction pattern in Fig. 2, one can immediately recognize it as 
originating in a circular aperture, although only intensities are seen. Most imaging 
methods have been developed for other disciplines (e.g. coherent diffraction imaging 
in X-rays) but also for astronomical interferometry,?° 2° demonstrating how rather 
complex images can be reconstructed. One remaining limitation is the nonunique- 
ness between the image and its mirrored reflection. Figure 2 shows an example of 
imaging an artificial binary star with an array of small optical telescopes in the 
laboratory, operated as an intensity interferometer with 180 baselines.!® 19 
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Fig. 2. Fourier-plane vs. image-plane information. The left column shows coherence maps 
(“diffraction patterns”) corresponding to the direct images at right. At top left, the Airy diffrac- 
tion pattern originates from a circular aperture. At bottom left is an experimentally measured 
coherence pattern from an artificial asymmetric binary star, built up from intensity-correlation 
measurements over 180 baselines in an array of optical telescopes in the laboratory. At right is the 
image reconstructed from these intensity-interferometric measurements.!®:!9 The circle shows the 
diffraction-limited resolution, thus realized by an array of optical telescopes connected through 
electronic software only, with no optical links between them. 


5. Signal-to-Noise in Intensity Interferometry 


No other current instrument in astronomy measures the second-order coherence of 
light. Since the noise properties in such measurements differ from those of other 
instruments, it is essential to understand the noise parameters and error budgets 
for defining realistic observing programs. 

For one pair of telescopes, the signal-to-noise ratio?” for polarized light is 
given in a first approximation by 


(S/N)rms = A-a-n-|yia(r)? Af? - (7/2), (3) 


where A is the geometric mean of the areas (not diameters) of the two telescopes; 
a is the quantum efficiency of the optics plus detector system; n is the flux of the 
source in photons per unit optical bandwidth per unit area per unit time; |712(r)|? is 
the second-order coherence of the source for the baseline vector r, with y12(r) being 
the mutual degree of coherence; Af is the electronic bandwidth of the detector plus 
signal-handling system, and T is the integration time. 
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5.1. Independence of Optical Passband 


Most of the parameters in Eq. (3) depend on the instrumentation, but nm depends on 
the source itself, being a function of its brightness temperature. For a given number 
of photons detected per unit area and unit time, the signal-to-noise ratio is better 
for sources where those photons are squeezed into a narrower optical band. 

Indeed (for a flat-spectrum source), the S/N is independent of the width of 
the optical passband, whether measuring only the limited light inside a narrow 
spectral feature or a much greater broadband flux. The explanation is that realistic 
electronic resolutions of nanoseconds are much slower than the temporal coherence 
time of optical light. While narrowing the spectral passband decreases the pho- 
ton flux, it also increases the temporal coherence by the same factor, canceling 
the effects of increased photon noise. This property was exploited already in the 
Narrabri interferometer?® to identify the extended emission-line volume from the 
stellar wind around the Wolf-Rayet star y? Vel. This could also be exploited for 
increasing the signal-to-noise by simultaneous observing in multiple spectral chan- 
nels, a concept foreseen for the once proposed successor to the original Narrabri 
interferometer.°’?93° A major breakthrough in sensitivity can be expected once 
energy-resolving detectors become practical to use for high photon count rates. 
This would enable straightforward parallel observing in multiple spectral channels, 
increasing the signal by a factor equal to the spectral resolution. 


5.2. Dependence on Source Temperature 


Another S/N property is that high-temperature sources can be measured but, in 
practice, cool objects cannot, no matter what their apparent brightness. To be a 
feasible target for long baseline interferometry, any source must both provide a 
significant photon flux, and have structures small enough to produce visibility over 
such baselines. This implies small sources of high surface brightness. Cool sources 
would have to be large in extent to give a sizable flux, but then will become spatially 
resolved already over short baselines. Seen alternatively, for stars with a given angu- 
lar diameter but decreasing temperature (thus decreasing fluxes), telescope diameter 
must be increased in order to maintain the same S/N. When resolved by a single 
telescope, the S/N begins to drop (the spatial coherence decreases), and no gain 
results from larger telescopes. For currently foreseen instrumentation, practically 
observable sources would have to be hotter than about solar temperature. 


6. Air Cherenkov Telescopes 


A long-baseline intensity interferometer requires large telescopes spread over some 
square kilometer or more. Precisely such complexes are being erected for a different 
primary purpose: the study of gamma-ray sources through the observation of visual 
flashes of Cherenkov light emitted in air from the particle cascades triggered by 
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gamma rays.” Since these flashes are faint, the telescopes must be large but do 
not need to be more precise than the spatial equivalent of a few nanoseconds light- 
travel time, that being the typical duration of these Cherenkov flashes. For good 
stereoscopic source localization, the telescopes need to be spread out over hundreds 
of meters, that being the typical extent of the light pool on the ground. These 
parameters are remarkably similar to the requirements for intensity interferometry 
and several authors have realized the potential for this application.?! °° The largest 
current such project is CTA, the Cherenkov Telescope Array.3+ 3° 

CTA is planned to have up to 100 telescopes spread over several square kilo- 
meters, with a combined light-collecting area of some 10,000 m?. Of course, it will 
mainly be devoted to its task of observing Cherenkov light in air, but other applica- 
tions are also envisioned: besides intensity interferometry,°© these include searches 
for rapid astrophysical events, observations of stellar occultations by distant Kuiper- 
belt objects, or use as a terrestrial ground station for optical communication with 
distant spacecraft. 

The impact on other Cherenkov array operations would probably be lim- 
ited since interferometry can be carried out during nights with bright moonlight, 
which — due to the faintness of the Cherenkov light flashes — precludes their 
efficient observation. If baselines of 2 or 3km could be utilized at short optical 
wavelengths, resolutions would approach 30 as, an unprecedented spatial resolu- 
tion in optical astronomy. 


6.1. (An)isochronous Telescopes 


For Cherenkov telescopes, a large field of view is desired while, in most telescopes, 
the image quality deteriorates away from the optical axis. Several telescopes have a 
Davies—Cotton design,*° whose spherical prime mirror gives smaller aberrations off 
the optical axis compared to parabolic systems. These particular telescopes are not 
isochronous, i.e. light striking different parts of the entrance aperture may not arrive 
to the focus at exactly the same time. The signal-to-noise improves with electronic 
bandwidth Af, i.e. with better time resolution for recording intensity fluctuations. 
The time spread in anisochronous telescopes acts like an “instrumental profile” 
in the time domain, filtering away the most rapid fluctuations. Fortunately, the 
gamma-ray induced Cherenkov light flashes last only a few nanoseconds, and thus 
the performance of Cherenkov telescopes cannot be made much worse, lest they 
would lose sensitivity to their primary task. 


6.2. Telescopic Image Quality 


The technique does not demand good optical quality, permitting use of flux col- 
lectors with point-spread functions of several arcminutes. Still, issues arise from 


bSee Chapter 6 of Volume 5 of this Handbook. 
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unsharp images: in particular, contamination by the background light from the 
night sky. Such light does not contribute any net intensity-correlation signal but 
it increases the photon-counting noise, especially when observing under moonlight 
conditions. 


6.3. Focusing at “Infinity” 


The foci of Cherenkov telescopes correspond to those atmospheric heights where 
most of the Cherenkov light originates, and the image of a cosmic object will be 
slightly out of focus. For a focal length of f = 10m, the focus shifts 1cm between 
imaging at 10km distance and at infinity. In order to minimize the night-sky back- 
ground, it could thus be desirable to refocus the telescopes at “infinity”. 


6.4. Telescope Positions in an Array 


The choice of telescopes within a larger array can be optimized for best coverage 
of the interferometric (u, v)-Fourier plane. As the source moves across the sky, pro- 
jected baseline lengths and orientations between telescope pairs change, depending 
on the angle under which the object is observed. Telescopes in a repetitive geometric 
pattern cause the baselines to be similar for many pairs of telescopes. Since celestial 
objects move from east towards west, baselines between pairs of telescopes that are 
not oriented exactly east-west will cover more of the (wu, v)-plane. 

During the night, stationary telescope pairs trace out ellipses in the Fourier 
plane as a function of the observatory latitude, the celestial coordinates of the 
source, and the relative placement of the telescopes.*! The rotation of the Earth 
enables aperture synthesis in software and, of course, is the very principle used in 
much of radio interferometry. 


6.5. Interferometry in Space? 


Some ideas for space-based instruments have been proposed. Amplitude interferom- 
eters would avoid atmospheric turbulence, but still require extreme optical stability. 
To relax such requirements, concepts for intensity interferometry between free-flying 
telescopes have been proposed.?” 9 If the signals are stored for later analysis, there 
is no need to keep the telescopes precisely positioned, but only to know their posi- 
tions at the time of observation. Such a system could enable the imaging of stellar 
surfaces in ultraviolet emission lines, delineating magnetically active regions. 


6.6. Detectors 


Typical Cherenkov telescopes have point-spread functions in the focal plane on the 
order of lcm. Matching photon-counting detectors include solid-state avalanche 
diode arrays and vacuum-tube photomultipliers. In principle, only one detector 
pixel per telescope is required (although some provision for measuring the signal 
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at zero baseline may be needed). In some telescopes, a pixel at the center of the 
large Cherenkov camera is electronically directly accessible, possibly suitable also 
for interferometry. An alternative concept is a dedicated detector mounted onto the 
outside of the mechanical cover of the ordinary camera. 

Bright sources, observed in broadband light with a large telescope, may generate 
count rates that are too great to handle practically. However, since the S/N is 
independent of the optical passband, the signal can be retained with a lower photon 
flux, if some color filter is used to reduce the flux to a suitable level, or perhaps a 
narrowband filter tuned to some spectral feature of astrophysical significance. 


6.7. Correlators 


An electronic correlator provides the averaged product of the intensity fluctuations, 
normalized by the average intensities. Current techniques permit electronic firmware 
units to be programed into high-speed digital correlators with time resolutions of a 
few ns or better. Equipment with such performance is also commercially available 
for various (nonastronomical) applications in photon correlation spectroscopy. 

Firmware correlators process data in real time, and eliminate the need for their 
further handling and storage. However, if something needs to be checked afterwards, 
the original data are no longer available, and alternative signal processing cannot 
be applied. An alternative is to time-tag each photon count and store all data for 
later analysis off-line. However, this may require a massive computational effort and 
possible problems may not get detected while observations are still in progress. 

Correlator requirements are much more modest than for correlators at radio 
telescope arrays. Those amplitude interferometers measure not only the spatial but 
also the temporal coherence, achieving radio imaging with high spectral resolu- 
tion. Optical spectra cannot realistically be obtained from intensity interferometry 
measurements: the spectral resolution comes from optical filters or the wavelength 
sensitivity of the detector. While correlators for radio arrays may be supercomputers 
to handle spatial and temporal correlations with many bits of resolution, a correlator 
for intensity interferometry can be a small table-top device controlled by a personal 
computer, merely correlating binary data streams of photon counts. 


6.8. Delay Units 


In the original Narrabri instrument, the telescopes were continuously moved during 
observations to maintain their projected baseline. Electronic time delays can now 
be used instead to assign successive measures of the spatial coherence to the appro- 
priate baseline length and orientation. To follow a source across the sky, a variable 
time delay (in either hard- or software) has to compensate for the change of timing 
of the wavefront at the different telescopes — of up to a few ps, corresponding to 
differential light travel distances of maybe 0.5 km. 
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6.9. Experimental Work 


As steps towards full-scale stellar intensity interferometry, laboratory and field 
experiments are being pursued. A dedicated test facility (“StarBase”) has been 
developed in Utah, while Cherenkov telescopes of the VERITAS array in Arizona 
have been used to verify telescope connections for interferometry. Laboratory exper- 
iments and simulations are being carried out by various groups to understand 
signal handling, correlator performance, and efficiency of image reconstruction 
algorithms.+% 5 


7. Intensity Interferometry on Extremely Large Telescopes 


Extremely large optical telescopes (ELTs), with apertures in the 30-40m range, aim 
at diffraction-limited imaging using adaptive optics in the near-infrared. Although 
achievable resolution is coarser than with long baselines, ELTs will be attractive for 
intensity interferometry, once they are outfitted with an array of high-speed photon- 
counting detectors.°+ °° The telescope aperture would be optically sliced into many 
segments (analogous to a wavefront sensor across the entrance pupil), each feeding 
a separate detector. Cross-correlations then realize intensity interferometry between 
pairs of telescope subapertures. This would be practical also when seeing conditions 
are inadequate for adaptive optics or when the segmented main mirror is only 
partially filled. Achievable resolution in the blue will be superior to that feasible 
with infrared adaptive optics by a factor of 3 or 5. Steps toward such a facility have 
been taken in recent instrument constructions.°© °° 

Although mirror segments on ELTs are smaller than Cherenkov telescopes, they 
offer certain advantages: image quality is arcseconds or better, which essentially 
eliminates background skylight, and permits the use of small detectors of very high 
quantum efficiency, such as single-photon-counting avalanche diodes. Precise optical 
collimation permits interference filters with very narrow bandpass to isolate spectral 
lines, and since the optical paths are isochronous, very fast electronics can be used 
to improve the signal-to-noise ratio. Although the finite size of the ELT aperture 
limits the extent of the interferometric (u,v)-plane covered, this can be sampled very 
densely, and an enormous number of baseline pairs can be synthesized, assuring a 
complete sampling of the source image, and its stable reconstruction. 


8. Astronomical Targets 


With optical imaging approaching resolutions of tens of microarcseconds (and also 
with a certain spectral resolution), we will be moving into novel and previously 
unexplored parameter domains, enabling new frontiers in astrophysics. With a fore- 
seen brightness limit of perhaps my = 7 or 8, for sources of a sufficiently high 
brightness temperature, initial observing programs have to focus on bright stars 


or stellar-like objects.3*:3°°! Promising targets for early intensity interferometry 
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observations thus appear to be relatively bright, hot, single or binary O-, B-, and 
Wolf—Rayet-type stars, perhaps with deformed shapes due to rapid rotation, or with 
dense stellar winds visible in emission lines. To reach fainter extragalactic sources 
appears to require multi-wavelength observations, probably achievable with some 
spectrally resolving detector array or, ultimately, with energy-resolving detectors. 


9. Outlook 


The scientific potential of long-baseline optical interferometry and of stellar surface 
imaging was realized already a long time ago, and concepts for long-baseline optical 
amplitude/phase interferometers have been worked out for constructions at ground- 
based observatories, in Antarctica, as free-flying telescopes in space, or even placed 
on the Moon. All those proposals are based upon sound physical principles, yet do 
not appear likely to be realized in any immediate future due to their complexity 
and likely cost. However, progress in instrumentation and computing technology, 
building upon years of experience in radio interferometry, together with the growing 
availability of large flux collectors in the form of air Cherenkov telescopes, now 
appear to enable electronic long-baseline interferometry by software in the optical 
band, several decades after corresponding aperture synthesis techniques were first 
established in the radio domain. 


References 


1. R. Hanbury Brown, The Intensity Interferometer: Its Applications to Astronomy (Tay- 
lor & Francis, 1974). 

2. R. Hanbury Brown, J. Davis and L. R. Allen, Mon. Not. Roy. Astron. Soc. 137, 375 
(1967). 

3. R. Hanbury Brown, J. Davis, L. R. Allen and J. M. Rome, Mon. Not. Roy. Astron. 
Soc. 137, 393 (1967). 

4, R. Hanbury Brown, Photons, Galaxies and Stars (Indian Academy of Sciences, 1985). 

. R. Hanbury Brown, Boffin: A Personal Story of the Early Days of Radar, Radio 

Astronomy and Quantum Optics (Adam Hilger, 1991). 

A. Glindemann, Principles of Stellar Interferometry (Springer, 2011). 

J. W. Goodman, Statistical Optics (Wiley-Interscience, 1985). 

R. Loudon, The Quantum Theory of Light, 3rd edn. (Oxford University Press, 2000). 

. L. Mandel and E. Wolf, Optical Coherence and Quantum Optics (Cambridge University 

Press, 1995). 

10. A. Labeyrie, S. G. Lipson and P. Nisenson, An Introduction to Optical Stellar Inter- 
ferometry (Cambridge University Press, 2006). 

11. S. K. Saha, Aperture Synthesis. Methods and Applications to Optical Astronomy 
(Springer, 2011). 

12. Y. Shih, An Introduction to Quantum Optics: Photon and Biphoton Physics (CRC 
Press, 2011). 

13. V. Malvimat, O. Wucknitz and P. Saha, Mon. Not. Roy. Astron. Soc. 437, 798 (2014). 

14. P. D. Nufiez and A. Domiciano de Souza, Mon. Not. Roy. Astron. Soc. 453, 1999 
(2015). 

15. A. Ofir and E. N. Ribak, Mon. Not. Roy. Astron. Soc. 368, 1646 (2006). 


Or 


42 


16. 
17. 
18. 
19. 
20. 
21. 
22. 
23. 


24. 


25. 
26. 
21. 
28. 


29. 
30. 
31. 
32. 


33. 
34. 
35. 
36. 
37. 
38. 
39. 
40. 
Al. 
42. 
43. 
4A. 


45. 
46. 
47. 
48. 
49. 
50. 
ol. 


52. 
53. 


54. 
55. 


56 
57 


D. Dravins 


A. Ofir and E. N. Ribak, Mon. Not. Roy. Astron. Soc. 368, 1652 (2006). 

T. Wentz and P. Saha, Mon. Not. Roy. Astron. Soc. 446, 2065 (2015). 

D. Dravins, T. Lagadec and P. D. Nufiez, Nature Commun. 6, 6852 (2015). 

D. Dravins, T. Lagadec and P. D. Nufiez, Astron. Astrophys. 580, A99, (2015). 

R. B. Holmes and M. S. Belen’kii, J. Opt. Soc. Am. A 21, 697 (2004). 

R. B. Holmes, S. Lebohec and P. D. Nunez, Proc. SPIE 7818, 781800 (2010). 

R. Holmes, B. Calef, D. Gerwe and P. Crabtree, Appl. Opt. 52, 5235 (2013). 

P. D. Nunez, R. Holmes, D. Kieda and S. LeBohec, Mon. Not. Roy. Astron. Soc. 419, 
172 (2012). 

P. D. Numiez, R. Holmes, D. Kieda, J. Rou and S. LeBohec, Mon. Not. Roy. Astron. 
Soc. 424, 1006 (2012). 

J. J. Dolne, D. R. Gerwe and P. N. Crabtree, Proc. SPIE 9146, 914636 (2014). 

D. Gerwe, P. Crabtree, R. Holmes and J. Dolne, Proc. SPIE 8877, 88770F (2013). 
R. Q. Twiss, Opt. Acta 16, 423 (1969). 

R. Hanbury Brown, J. Davis, D. Herbison-Evans and L. R. Allen, Mon. Not. Roy. 
Astron. Soc. 148, 103 (1970). 

J. Davis, Dudley Obs. Rep. 9, 199 (1975). 

R. Hanbury Brown, Proc. [AU Coll. 50, 11—1 (1979). 

S. LeBohec and J. Holder, Astrophys. J. 649, 399 (2006). 

S. LeBohec, M. Daniel, W. J. de Wit, J. A. Hinton, E. Jose, J. A. Holder, J. Smith 
and R. J. White, AIP Conf. Proc. 984, 205 (2008). 

D. Dravins, S. LeBohec, H. Jensen and P. D. Nufiez, New Astron. Rev. 56, 143 (2012). 
B.S. Acharya, M. Actis, T. Aghajani et al., Astropart. Phys. 48, 3 (2013). 
Cherenkov Telescope Array (CTA), http://www.cta-observatory.org/ (2018). 

D. Dravins, S. LeBohec, H. Jensen and P. D. Nunez, Astropart. Phys. 43, 331 (2013). 
I. Klein, M. Guelman and S. G. Lipson, Appl. Opt. 46, 4237 (2007). 

E. N. Ribak, P. Gurfil and C. Moreno, Proc. SPIE 8445, 844509 (2012). 

E. N. Ribak and Y. Shulamy, Exp. Astron. 41, 145 (2016). 

J. M. Davies and E. S. Cotton, Solar Energy 1, 16 (1957). 

D. Ségransan, New Astron. Rev. 51, 597 (2007). 

D. Dravins and §. LeBohec, Proc. SPIE 6986, 698609 (2008). 

C. Foellmi, Astron. Astrophys. 507, 1719 (2009). 

W. Guerin, A. Dussaux, M. Fouché et al., Mon. Not. Roy. Astron. Soc. 472, 4126 
(2017). 

W. Guerin, J.-P. Rivet, M. Fouché et al., Mon. Not. Roy. Astron. Soc. 480, 245 (2018). 
E. Horch and M. A. Camarata, Proc. SPIE 8445, 84452L (2012). 

D. Kieda and N. Matthews, Proc. SPIE 9907, 990723 (2016). 

S. LeBohec, B. Adams, I. Bond et al., Proc. SPIE '7734, 77341D (2010). 

N. Matthews, D. Kieda and S. LeBohec, J. Mod. Opt. 65, 1336 (2018). 

C. Pellizzari, R. Holmes and K. Knox, Proc. SPIE 8520, 85200J (2012). 

J. Rou, P. D. Nufiez, D. Kieda and 8. LeBohec, Mon. Not. Roy. Astron. Soc. 430, 
3187 (2013). 

G. Shoulga and E. N. Ribak, Appl. Opt. 56, A23 (2017). 

P. K. Tan, A. H. Chan and C. Kurtsiefer, Mon. Not. Roy. Astron. Soc. 457, 4291 
(2016). 

C. Barbieri, D. Dravins, T. Occhipinti et al., J. Mod. Opt. 54, 191 (2007). 

D. Dravins, C. Barbieri, R. A. E. Fosbury et al., in Instrumentation for Extremely 
Large Telescopes, MPIA Spec. Publ., Vol. 106, (2006), pp. 85-91. 

I. Capraro, C. Barbieri, G. Naletto et al., Proc. SPIE 7702, 77020M (2010). 

G. Naletto, C. Barbieri, T. Occhipinti et al., Proc. SPIE 6583, 65830B (2007). 


58 


59 
60 
61 


Intensity Interferometry 43 


. G. Naletto, C. Barbieri, T. Occhipinti et al., Telescope, Astron. Astrophys. 508, 531 
(2009). 

. G. Naletto, C. Barbieri, E. Verroi et al., Proc. SPIE 7735, 773545 (2010). 

. L. Zampieri, G. Naletto, C. Barbieri et al., Proc. SPIE 9907, 990722 (2016). 

. S. Trippe, J.-Y. Kim, B. Lee et al., J. Korean Astron. Soc. 47, 235 (2014). 


This page intentionally left blank 


Chapter 4 


Dispersed Interferometers 


David J. Erskine 


Lawrence Livermore National Laboratory 
7000 East Ave, Livermore, CA 94550, USA 


erskine1@llnl.gov 


We describe dispersed interferometers, including dispersed Fourier Transform 
spectroscopy (dFTS), externally dispersed interferometers (EDI) also known as 
dispersed fixed delay interferometers (DFDI), spatial heterodyning spectroscopy 
(SHS) and the similar heterodyned holographic spectroscopy (HHS). 


1. Overview 


Spectroscopy via dispersed interferometry is a hybrid between two well-established 
techniques of pure interferometry (such as Fourier Transform Spectroscopy) and 
dispersive spectroscopy using a prism or grating. Figure 1 is a notional diagram of 
various spectrograph techniques, displayed in Fourier space, with purely dispersive 
and purely interferometric at the top and bottoms, and the hybrids in between. The 
horizontal axis is the spatial frequency along the dispersion axis, or features per 
wavenumber. (It is convenient to use wavenumber (o = 1/A) instead of wavelength 
for dispersion in interferometry.) Since the unit of o is cm~!, this frequency has 
units of delay or cm. The horizontal axis can be considered the distribution of optical 
pathlengths needed to create the resolution. Imagine these pathlength differences, 
for example, between the different grooves of a diffraction grating, or between two 
arms of an interferometer. 

In the top panel (a), a high resolution spectrograph having a narrow instrument 
response peak in wavenumber space requires a wide peak in delay or Fourier space. 
From the uncertainty principle, the product of the full width half max (FWHM) 
peak widths in frequency and wavenumber spaces is about 0.88. The resolution is 
R= o0/Ao; hence R = 2pnwnmo/0.88, where Phwhm is the Gaussian half width at 
half height, because we are only plotting the positive frequencies for convenience. 
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Fig. 1. Notional arrangement of astronomical spectrograph systems, from purely dispersive (a) 
to purely interferometric (e), in Fourier space (features per wavenumber, or delay space). For all 
methods, one needs to cover maximal delay space to achieve high resolution. (b) The Externally 
Dispersed Interferometer (EDI) measures delay space in chunks (single or multiple peaks at user 
chosen positions) on an integrating detector; (c) dispersed Fourier Transform Spectroscopy (FTS) 
scans delay space with a single medium-wide peak using a time-responsive detector. (d) Spatial 
Heterodyning Spectroscopy (SHS) splays a range of delays spatially along a detector, recording at 
once the interferogram on an integrating detector; (e) purely interferometric FTS maps out delay 
space directly with a narrow peak, measuring the interferogram. Graphics reproduced from Ref. 1. 
(See electronic edition for a color version of this figure.) 


The high resolution peak is depicted as having a smaller height than the lower 
resolution peak to suggest that as we reduce the slit width to achieve higher reso- 
lution in a classical dispersive spectrograph, the amount of light from an extended 
source passing through the narrower slits decreases. (This would not apply to a 
diffraction-limited source from an adaptive optics-corrected telescope). 


2. Dispersed Fourier Transform Spectrometer 


2.1. Classic Undispersed FTS 


The bottom panel (e) of Fig. 1 depicts behavior of the classic undispersed inter- 
ferometer or Fourier Transform Spectrometer (FTS). The delay is sampled by a 
narrow peak and scanned contiguously vs. time over the wide delay range needed 


Dispersed Interferometers 47 


to record high resolution information.* The peak is very narrow because there is 
no disperser or bandpass filter, so the essentially single-pixel detector responds to a 
wide wavenumber range — the peak width is the reciprocal of the wide bandwidth. 

Scanning over the delay range measures the interferogram of the spectrum. 
A Fourier transform is then applied to convert the interferogram into a spectrum. 
It only takes about a couple of cm of delay range to develop, say, 50,000 resolution 
at 0.5m wavelength. So interferometers do not need to be very large to achieve 
high resolution. 

The FTS has the advantage over conventional dispersive spectrographs of being 
very compact, with an output lineshape that is very robust to changes in the input 
beam pupil. (All dispersed interferometers tend to share this robustness.) Disad- 
vantages include the inability to measure time-dependent or pulsed phenomenon, 
because of the time needed to scan over the delay. Furthermore, its detector must 
record a high frequency time-dependent signal (as it scans over fringes), so it cannot 
use time-integrating detectors optimized for sensitivity. 

Because fringes from all the wavenumbers combine on a single pixel, the net 
fringe signal for the classic FTS is washed out (diminished by the square root of the 
number of independent wavenumbers being measured), so the photon signal-to-noise 
ratio is usually too poor for many astronomical sources. However, in the infrared 
there are many more photons per unit of power, so the FTS has found useful remote 
sensing applications in long wavelengths, especially since its compactness makes 
airborne and spaceborne platforms practical. In comparison, an infrared dispersive 
spectrograph of comparable resolution could be prohibitively large because its size 
scales with the long wavelength. 


2.2. Adding Dispersion to Increase Fringe Visibility 


A dispersed FTS (dFTS) is created by adding a disperser in series with the inter- 
ferometer,? * depicted as Fig. 1(c). Figure 2 shows how decreasing the bandwidth, 
by insertion of a filter or a dispersive element in series with the FTS, reciprocally 
increases the width of the interferogram in delay space (hence the wider peak in 
Fig. 1c) compared with Fig. 1(e)). This increases the fringe visibility, and hence 
photon signal-to-noise ratio for a given delay, making the dFTS much more practical 
for astronomical use than the classical (undispersed) FTS. 

The disperser organizes the data into narrow bandwidth channels. Each is an 
independent FTS interferogram similar to an undispersed FTS, but having much 
greater fringe visibility. A custom “FROID” algorithm was used to assemble the 
manifold parallel channel information into a single high resolution (R = 50,000) 
broadband spectrum.? 

A prototype version (2007) of the dFTS? was tested on starlight at the Clay 
Center Observatory at the Dexter-Southfield School in Brookline, MA, on a 25-inch 


4For example, the delay might be scanned by changing it in increments of 1/4 wave over a total 
range of perhaps a million wavelengths of delay. 
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Fig. 2. When the bandwidth is narrowed by insertion of a filter or use of an external disperser, the 
interferogram fringe pattern becomes reciprocally wider in delay space. Simulated interferograms 
as shown. Graphics reprinted from Ref. 2, Copyright (2007), with permission from AAS. 


telescope that regularly achieves 1” seeing. Figure 3 shows the zero point radial 
velocity reproducibility from night to night, observing a Th—Ne spectral lamp. The 
variability ~10 m/s is believed primarily due to temperature drifts of the instru- 
ment, which was not housed in a temperature controlled environment but one with 
~5°C changes in ambient air temperature. 

The second generation (2009) instrument (Fig. 4) measured? single-line spec- 
troscopic binaries at the Steward Observatory 2.3m Bok Telescope. This apparatus 
was more compact than the first. Figure 5 shows radial velocities of the F6V star 19 
Draconis with a velocity scatter of 27.5m/s. Measurements on double-lined spec- 
troscopic binaries were also performed there.* 


3. Externally Dispersed Interferometry 


3.1. Interferometer in Series with Dispersive Spectrograph 


Externally dispersed interferometry!:*:7 16 (EDI), also called by others dispersed 


fixed delay interferometry!!!’ (DFDI), is the series combination of a fixed delay 


o] 


interferometer with a dispersive spectrograph (Figs. 6 and 7). The interferometer 
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Fig. 3. Zero point radial velocity reproducibility from night to night, observing a Th—Ne spectral 
lamp. The variability ~10 m/s is believed primarily due to temperature drifts of the instrument, 
which was not housed in a temperature controlled environment but one with ~5°C changes in 
ambient air temperature. Graphics reprinted from Ref. 3, Copyright (2009), with permission from 
AAS. 


creates a uniform sinusoidal grid or comb in its transmission function vs. wavenum- 
ber o (units of cm~!), as T(a) = 0.5[1 + cos(2707)], having a frequency along the 
wavenumber axis proportional to the interferometer delay 7 (units of cm). This is 
typically 1-5 cm, depending on the linewidths in the input spectrum (for Doppler), 
or the resolution desired (increases with 7). It is set by choice of a glass etalon 
thickness and mirror positions that control pathlength difference between two inter- 
ferometer arms. 


3.1.1. Moiré Patterns Reveal Doppler and High Resolution Features 


A heterodyning effect caused by multiplication of the periodic transmission T(c) 
against the stellar input spectrum creates moiré fringe patterns on the detector that 
sum with the ordinary spectrum. These moiré patterns are high frequencies (narrow 
features) of the input spectrum beaten down (shifted) to lower frequencies, which 
can better survive slit blurring and paucity of detector pixels. 

This heterodyning creates a new sensitivity peak at a user selectable high fre- 
quency (Fig. 10 left]), while preserving the native spectrograph sensitivity peak 
centered at zero frequency. This allows precision radial Doppler velocimetry” !*:!4 
and high resolution spectroscopy!®: 1:16 with a much lower resolution native spec- 
trograph than without the interferometer. Hence EDI imparts a resolution boosting 
ability, and this occurs over the entire native bandwidth. 
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Fig. 4. Photograph and schematic of the second version of the dFTS. The key element distin- 
guishing it from a classic FTS is the use of a dispersive spectrograph (right side) ahead of the 
detector. Graphics reprinted from Ref. 3, Copyright (2009), with permission from AAS. 
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Fig. 5. The radial velocity scatter of this measurement of F6V star 19 Draconis was 27.5 m/s. 
Graphics reprinted from Ref. 3, Copyright (2009), with permission from AAS. 
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Fig. 6. Schematic (left) and photo (right) of an externally dispersed interferometer assembled 
from components found in a typical university optics lab and a 20k resolution Jobin-Yvon HR640 
grating spectrograph operating near 500nm. The interferometer delay was 1.1 cm. Together with 
an iodine cell (not shown), this apparatus measured the 12m/s amplitude tugging of the Moon 
on the Earth (Fig. 11) from fringing spectra (Fig. 7 [right]) of sunlight fed by fiber from a rooftop 
heliostat. No environmental thermal or mechanical stabilization was used (the short exposure times 
did not require interferometer fringe stabilization). Schematic and photograph reproduced from 
Ref. 5. (See electronic edition for a color version of this figure.) 
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The inclusion of the interferometer, by its very uniform spectral fiducial comb, 
can also boost the stability to undesired changes in pupil, focal spot or wavelength 
drifts in the disperser, which often plague a native spectrograph used alone. 


3.1.2. Heterodyning Allows Lower Resolution Native Spectrographs 


By permitting the use of a lower resolution spectrograph, the EDI technique reaps 
advantages of the wider bandwidth and higher throughput of a lower resolution 
spectrograph. (For a fixed number of pixels, bandwidth increases approximately 
reciprocally to native resolving power.) The wider bandwidth uses a larger fraction 
of the stellar flux, which can improve the photon-limited signal-to-noise ratio. The 
wider slits and greater tolerance to focal errors of a lower resolution spectrograph 
allows lenses and gratings to be optimized for high efficiency rather than for a stable 
focal spot. 


3.1.3. Phase Stepping Elucidates Moiré Patterns 


Figure 7 (left) shows the notional diagram of EDI and how absorption lines acting 
against the sinusoidal comb created by the interferometer form moiré fringe pat- 
terns. Figure 7 (right) shows measured moiré patterns for sunlight and the iodine 
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Fig. 7. (Left) Schematic of EDI, which is an interferometer of one or several fixed delay values 
(1-5 cm) in series with a dispersive spectrograph. The interferometer creates a sinusoidal comb with 
a period ~1/r. The input spectral features combine with the sinusoid to create moiré patterns. 
Phase shifts are proportional to the Doppler velocity. The shape of the moiré can be processed to 
recover input spectral features at higher resolution than the native spectrograph alone (10x boost 
has been demonstrated). The schematic was reproduced from Ref. 6. (Right) Solar and iodine 
fringing spectra near 508 nm, when measured separately by the EDI.” A small portion of total 
13 nm bandwidth is shown. Smile-like features in iodine are due to a set of absorption lines whose 
spacing varies gradually from slightly less than the comb to slightly more. (See electronic edition 
for a color version of this figure.) 
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Fig. 8. (a-c) Simulation of heterodyning and moiré formation that occurs optically in the instru- 
ment, and then (d, e) its reversal during reconstruction of the resolution-boosted spectrum. Moiré 
patterns using a test absorption spectrum of two pairs of black lines (black curve in [e]). (a) Sinu- 
soidal comb multiplies input spectrum — comb frequency is proportional to delay 7. Only three 
delays (0.75, 1.25, 2cm) of eight shown. (a) Without blur. (b) With blur, the comb is unresolved 
but moiré pattern remains. (c) Complex expression of moiré (whirl or W) from Fig. 9 Red (real), 
blue (imaginary). (d) Whirls upshifted in frequency; real part taken to form wavelets. (e) Sum 
of wavelets forms reconstructed output (red curve). An equalization step weights the wavelets to 
eliminate ringing. Native spectrum (dashed green) has insufficient resolution (2cm~+) to resolve 
test pair, but EDI output (red) easily resolves them. Graphs are ~10cm7! across at average 
wavenumber of 7450 cm~+. Graphics reproduced from Ref. 1. (See electronic edition for a color 
version of this figure.) 


spectrum, and Figs. 8‘a) and 8‘b) simulate them for two line pairs of different 
spacing with (b), and without (a), slit blurring. 

Data are usually taken with several exposures while incrementing the inter- 
ferometer delay in subwavelength amounts, called “phase stepping.” This allows 
the fringing (moiré) and nonfringing (ordinary spectrum) components to be sep- 
arated during analysis, since only the true fringes will vary synchronously to the 
commanded phase dither. Both of these components can provide useful spectral 
information which can be combined to form the net EDI output, or used separately. 

Because the fringing component involves differences between exposures, and 
only responds to signals synchronous to the applied interferometer dither, the EDI 
is naturally immune to additive fixed-pattern errors that can plague the ordinary 
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Fig. 9. How to convert a moiré pattern to complex data. (a) Vertical lineout across multiphase 
stack for a given wavenumber produces an intensity vs. phase plot (b), which is fitted to a sinusoid 
(black curve). (c) The sine and cosine amplitudes (red and blue curves) are the whirl’s (W) 
imaginary and real complex values, for a given wavenumber. The vertical offset of the fit is the 
ordinary spectrum (green, Boyq) at that wavenumber. Graphics reproduced from Ref. 1. (See 
electronic edition for a color version of this figure.) 


spectrum. Hence, in many cases the EDI can produce a cleaner output spectrum 
than the native spectrograph, even at the same resolution as the native. 

The temporal phase stepping allows use of echelle spectrographs, which may 
have a narrow beam as small as 1 pixel high. These are useful because of their wide 
bandwidth. Alternatively, linear spectrographs having a tall slit can splay the phase 
spatially along the slit, or apply both spatial and temporal variations. 

An interferometer has two complementary phased outputs whose intensities, 
when summed, equal the input (assuming ideal optics). The complementary output 
can also be used to produce an EDI signal if it is detected on separate pixels. 


3.1.4. EDI Instrument Lineshape as a Wavelet 


Figure 10 (right) compares the instrument lineshape vs. wavenumber between a 
conventional spectrograph and the EDI. Whereas the conventional lineshape is a 
peak (b or c), the EDI lineshape is a wavelet (a) which has an envelope set by the 
native spectrograph (b), and an interior sinusoid whose frequency is proportional 
to delay 7, which can be set by the user to be almost arbitrarily high. Note that 
the slope (red line) of a fringe inside the wavelet can be much higher than the slope 
(green) of the native spectrograph — it can easily be as high as the high resolution 
spectrograph lineshape (c, blue). 


3.1.5. Moiré Phase Yields Doppler Velocity 


The moiré phase shift between two observations is proportional to the Doppler 
velocity shift of the input source. To make the measurement independent of drifts 
in 7, such as due to thermal changes in the optical mounts, the moiré patterns of 
a reference spectrum such as iodine are also simultaneously observed on the same 
detector. These are separated from the stellar component mathematically. Then 
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Fig. 10. Two manifestations of the EDI instrument response: left graphic is in frequency space, 
and right graphic is along dispersion axis (pixel, wavenumber o, or wavelength \ = 1/o space). 
(Left) Heterodyning shifts the native spectrograph sensitivity peak (green) from zero to a higher 
frequency (red) where science frequencies typically reside, by delay 7, and an amplitude of 50%. 
Frequency in units of features per wavenumber (cm~!), conveniently has the same units of delay, 
units of cm. (Right) The EDI instrument response is a corrugated peak or wavelet (bold curve 
(a)), which has an inner portion having much higher slope (and thus higher Doppler sensitivity) 
than the low resolution native spectrograph alone (dashes, (b)), which defines the envelope. The 
slope of the sinusoid is proportional to 7. For use in high resolution spectroscopy, the central 
area (cross-hatched) is made unambiguous by combining several wavelets of different periodicities 
(delays), as demonstrated in Fig. 14 Graphics reproduced from Ref. 16. (See electronic edition for 
a color version of this figure.) 


the change in the above two velocity shifts yields the corrected Doppler velocity. 
This is independent to small (<A/4) drifts in 7 since these affect the stellar and 
iodine moiré phase by the same amount. For example, the open air EDI shown in 
Fig. 6, not having thermal or environmental controls, was used to make the few m/s 
precision Doppler measurements shown in Fig. 11. 


3.1.6. Moiré Shape Yields High Resolution Spectrum 


The same EDI apparatus used for Doppler velocimetry can be used to perform high 
5: 16 also referred to as resolution boosting (as in Figs. 14-18) 
by using a different kind of analysis. The moiré patterns are processed mathemati- 


resolution spectroscopy, 


cally to reverse the heterodyning frequency shift that originally occurred optically. 
Whereas the original optical heterodyning lowered the frequencies to form the moiré 
patterns recorded on the detector, a multiplication by e’?7°7 
by amount 7, restoring the signal to its original high frequency. After appropriate 


increases the frequency 


filtering (called equalization) to reshape the net frequency response into a Gaussian, 
a reconstructed spectrum is produced having its resolution boosted over the native 
resolution. (Examples shown in Section 3.3 below.) 

3.2. Demonstrations of EDI Doppler Velocimetry 

3.2.1. Initial Testing on Sunlight 


Figure 6 (left) shows a photo of (essentially) the EDI apparatus used to make solar 
measurements for initial testing of the EDI concept (the iodine cell was not in the 
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Fig. 11. (Upper) Doppler velocity of sunlight” over a month’s time detects the 12 m/s amplitude 
tugging of the Moon on Earth. (Lower) A bromine cell mimics the spectra of a zero velocity 
source (which then passes through the iodine cell), demonstrating m/s scale repeatibility on a 
11-day timescale. The EDI shown in Fig. 6 (Ref. 8) was used as depicted (but with an iodine 
cell at the fiber end) without any environmental thermal or mechanical stabilization. The 20k 
resolution native spectrograph used without the interferometer would have insufficient resolution 
and prohibitively large drift of its lineshape to allow such precision. Note that the symbol o in 
this figure refers to the standard deviation of the data values, not to wavenumber. (See electronic 
edition for a color version of this figure.) 


beam path in this photo). A fiber from a roof-mounted heliostat brought sunlight 
to a laboratory optical table on which the interferometer resided. Figure 11 (upper) 
shows the 12m/s signature of the Moon tugging the Earth in the measurement of 
the solar radial velocity over a month’s time. Figure 11 (lower) shows the zero- 
velocity test by measuring vs. time the velocity of a bromine absorption cell (which 
is a faux-star that is, of course, stationary), relative to the iodine cell, which is 
also stationary. We see only a few m/s drift over 11 days, which is excellent for an 
unstabilized cavity and environmentally unprotected system. 

This is remarkable because no environmental thermal or mechanical stabiliza- 
tion was used (the short exposure times did not require interferometer fringe stabi- 
lization), whereas a conventional spectrograph typically requires such environmental 
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control. This is possible because the EDI technique transfers the responsibility of 
the high resolution measurement from the native spectrograph to the interferome- 
ter. The interferometer is more robust to drifts in beam profile because a sinusoidal 
fringe has only three significant degrees of freedom (phase, magnitude, and intensity 
offset) whereas the point spread function of a grating spectrograph can have many 
more degrees (of order one for each diffraction grating groove involved). 


3.2.2. New Exoplanet Detected 


A new exoplanet HD 102195b was discovered!? by Jian Ge’s team using the EDI 
technique with the ET instrument (Fig. 12) at the moderately small aperture 0.9m 
Kitt Peak Nat. Obs. telescope. The small aperture discovery was possible because 
of the high throughput!! (49% from fiber output to detector) enabled by the lower 
native spectrograph resolution (R ~ 5000) allowed by the EDI technique. This native 
resolution is too low to perform precision radial velocimetry without the benefit of 
an interferometer. 
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Fig. 12. Discovery!? of a new exoplanet HD 102195b, performed in spring 2005 by Jian Ge’s team 
using the ET instrument?! at the Kitt Peak Nat. Obs. (KNPO) 0.9 m coudé feed, and later by ET 
at the 2.1m KNPO telescope. Velocity errors are ~10 and ~20 m/s for the 2.1m and coudé feed, 
respectively. The solid curve is the orbital fit. The native spectrograph has only 5000 resolution, 
which is too low for precision radial velocimetry without an interferometer. Planetary discovery 
with a 0.9m telescope of a V = 8.05 magnitude star is possible due to the high throughput of 
the instrument (49% from fiber output to detector), allowed by the lower native spectrograph 
resolution requirement of the EDI technique. Graphics reprinted from Ref. 12, Copyright (2006), 
with permission from AAS. 
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Fig. 13. (Left panels) Radial velocity measurements of star GJ 699 in the near infrared with the 
TEDI instrument at the Hale telescope. (Upper left) Solid curve shows expected behavior. The data 
clearly measures the effect of the Earth’s motion (30km/s component from solar orbit, 300 m/s 
component from rotation) with residuals in bottom left panel. The 2700 native spectrograph 
resolution is insufficient to perform precision radial velocimetry without the interferometer. The 
Cassegrain mounting produces a changing gravity vector. Yet the dominant error is believed to be 
insufficient calibration of telluric lines since removing them in a simulation (bottom right) reduced 
error from 43 to 13m/s, which matches the expected photon noise. Graphics reproduced from 
Ref. 14 with permission from PASP. 


3.2.3. Mt. Palomar Hale Telescope: Doppler 


An EDI version called “TEDI” was mounted on the mirror of the 5-meter Hale tele- 
scope at Mt. Palomar Observatory inside the Cassegrain output hole, in series with 
the TripleSpec!” near-infrared echelle spectrograph, of bandwidth 0.95-2.45 um 
(4100-10500 cm~'), to test on M-stars. The native resolution of 2700 is insufficient 
by itself to perform precision Doppler velocimetry, and the Cassegrain mounting 
produced a changing gravity vector which would cause problems for a conventional 
instrument. Yet Fig. 13 shows that accurate radial velocity data on an M-star in the 
near infrared was obtained over several months. Insufficient calibration of telluric 
lines was believed to be the dominant error source. 


3.3. Demonstrations of EDI High Resolution Spectroscopy 


The primary purpose of TEDI was to demonstrate Doppler measurements, and eight 
delays (0.1, 0.3, 0.7, 1, 1.3, 1.7, 2.9, 4.6cm) were provided to optimize for different 
rotational linewidths of different stars. However, we realized that the multiple delays 
also presented an opportunity to demonstrate high resolution spectroscopy with a 
boost factor much higher than the ~2x boost demonstrated earlier with a single 
delay EDI.1° These TEDI high resolution demonstrations (Figs. 14 and 16) were 
very successful! !® and a resolution boost as high as 10x was achieved. 

Figure 14‘a) shows that with multiple delays the high resolution EDI output 
spectrum is a sum over many wavelets, one per delay. Figure 14{b) shows that a 
ThAr doublet can be easily resolved by the EDI, even though this doublet separation 
is smaller than the native spectrograph resolution element. 
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Fig. 14. (a) Reconstruction of an otherwise unresolved ThAr doublet (7556, 7557.6cm~!) from 
many wavelets measured at multiple delays. (b) The native spectrograph (green curve) cannot 
resolve the doublet. The EDI reconstructed spectrum (red curve), which is sum of wavelets, then 
equalized to R = 16,000, easily resolves the doublet. The data were measured by the TEDI 
interferometer at the Hale 200 inch telescope in series with the TripleSpec NIR spectrograph 
(0.95-2.45 ym), using six delays of 0.1-1.7cm. Graphics reproduced from Ref. 16. (See electronic 
edition for a color version of this figure.) 


Figure 15 shows the instrument response in Fourier space, which is called a 
Modulation Transfer Function (MTF). The peaks demark positions in delay space 
of the installed glass etalons. A contiguous coverage of delay space is desired to 
form a net Gaussian response (after equalization removes the dips), otherwise gaps 
cause ringing in the effective instrument lineshape. For the early data of Fig. 14, the 
largest contiguous range was up to 1.7cm using E1—E6, because the gap at 2.4cm 
prevented use of E7. 

Later data such as Fig. 16 was taken with benefit of a new delay E6.5 at 2.4cm, 
which filled this gap and extended the contiguous delay range to 2.9cm. The wider 
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Fig. 15. Modulation Transfer Function (MTF) for the EDI evaluated at wavenumber 7450 cm7~! 
shows instrument sensitivity to a given frequency along the dispersion axis in features per 
wavenumber, as variable rho, which also has units of delay in cm. The MTF here describes later 
data, including that of Fig. 16, where the gap at 2.4cm between E6 and E7 was filled by purchase 
of a new delay E6.5, which was swapped for El (our apparatus could only hold eight delays). In 
this MTF, each delay (E2—E8, 0.3-4.6 cm) produces a peak in sensitivity (red curve), which has 
the same width as the native spectrograph peak (green curve) positioned at zero, but centered at 
its delay value. The goal of using multiple delays of different values is to create a net sensitivity 
curve which has no gaps. Later during analysis, multiplication by an equalization curve removes 
the dips to create an ideal Gaussian shaped MTF (black dashed curves of various resolution). The 
wider range of delays allowed a higher reconstructed resolution. The highest delay E8 at 4.6 cm 
could not be included in spectral reconstruction due to gap between E7 and E8, but was useful 
for measuring Doppler shifts because it was the most sensitive due to its large delay. Graphics 
reproduced from Ref. 16. (See electronic edition for a color version of this figure.) 
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Fig. 16. Demonstration of 10-fold resolution boost observing star & CrB. The native spectrum 
(green dashes), with resolution R = 2700, cannot resolve the telluric features. The TEDI spectrum 
(red curve) used seven contiguous delays, up to 2.9cm, to produce resolution 27,000. The model of 
tellurict® and ThAr!® lines (gray curve), blurred to 27,000 resolution, shows excellent agreement. 
Resolution boosting occurs simultaneously across the full bandwidth (0.9-2.45 wm) of the native 
TripleSpec spectrograph, with a resolution improvement of (maximum delay/wavelength). Graphic 
reproduced from Ref. 20. (See electronic edition for a color version of this figure.) 
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delay range allowed a higher reconstructed resolution in Fig. 16, a 10-fold boost. 
Since the “filter” wheel held only eight glass etalons, we installed E6.5 in the El 
position. 

An even higher boost of 45,000 could be achieved by including two more 
delays at 3.5 and 4cm to fill the gap between E7 and E8 and form a contiguous 
range up to 4.6cm. (Supposing we obtain a filter wheel that holds more than eight 
delays.) 


3.4. Other Advantages of EDI 
3.4.1. Extremely Wide Bandwidth 


Figure 17 demonstrates the very wide bandwidth of EDI, unlimited by the interfer- 
ometer component and only limited by the bandwidth of the native spectrograph. 
In contrast, the conventional method of reducing slitwidth to increase resolution 
requires increasing the number of detector pixels. (Or if the number of pixels is 
kept fixed, the bandwidth decreases reciprocally to the resolution increase.) This 
wide bandwidth is possible because the EDI interferometer comb has a period that 
is essentially uniform across the band (only slightly changing with glass disper- 
sion). Figure 21 of Sec. 4 compares interferometer combs of EDI with the strongly 
wavenumber-dependent comb of SHS. 


3.4.2. Immunity to Additive Fixed-Pattern Noise 


Because data are collected in multiple exposures in a phase-stepping manner, only 
fringes that vary synchronously to the commanded interferometer phase increment 
appear in the processed moiré signal. Hence EDI is immune to additive offsets or 
fixed-pattern errors that often plague the ordinary spectrum. In TEDI such fixed- 
pattern errors were due to bad pixels and background emissions. Figure 18 of the 
ThAr lamp, at 9174 and 9214cm7!, are examples of how the EDI signal (red curve) 
is immune to bad pixels that pollute the native spectrum (green curve). 


3.4.3. Robustness to Wavelength Drift of Native Spectrograph 


Besides boosting the resolution of a spectrograph beyond the limits imposed by clas- 
sical optics (slitwidth, focal blur, detector pixel spacing, etc.), the other important 
advantage of EDI is that it is extremely robust to errors in the native spectrograph 
point spread function shape and position. Figure 19 (top) shows severe and irreg- 
ular wavelength drifts suffered by the TEDI native spectrograph while observing 
starlight. Yet we still obtained high resolution spectra under these drifts. 

Figure 19 (bottom three panels) is a simulation using TEDI data of a ThAr 
line that is deliberately shifted across the detector, to show that the output spectra 
reacts by only a small shift, 1/20th of the applied shift. A further reduction of an 
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Fig. 17. An externally dispersed interferometer has an extremely wide bandwidth limited only 
by the native spectrograph, because the sinusoidal comb of the external interferometer has a 
uniform period, not dependent on the angle of diffraction from an internal grating. (a) This TEDI 
reconstructed spectrum spans four orders of the TripleSpec echelle spectrograph in the NIR (4100— 
10500 cm~+), observing HD219134 at resolution R = 11,000 (4x boost). Panels (b, c, d) zoom in 
telluric feature near ~4980cm~! due to CO2 molecule. Graphics reproduced from Ref. 16. (See 
electronic edition for a color version of this figure.) 


order of magnitude or more can be obtained by strategically shaping the instrument 
response so that a cross-fading occurs between overlapped lineshapes of neighboring 
delays.?4 A numerical simulation in Sec. 10 of Ref. 16 shows a reaction 350x smaller 
than the applied wavenumber shift. 

The robustness of EDI to native spectrograph lineshape distortions and drift 
is an important practical advantage, since such distortions often form the limiting 
floor to the achievable radial velocity noise. 
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Fig. 18. Measurement (red curve) of TEDI’s ThAr lamp to a 6x boosted resolution of 19,000, 
compared to a National Institute of Standards and Technology (NIST) measurement?® of a differ- 
ent ThAr lamp blurred to the same resolution (black dashes), D-order. Purple text is the NIST 
assignment of species. The native spectrograph (green curve) has a resolution of ~3300. The lower 
heights of Th lines relative to Ar lines for our lamp are consistent (Fig. 7 of Ref. 22) with the smaller 
current (~10mA) we used vs. that used by NIST (~20mA). Note the extremely high dynamic 
range of the measurement — ~0.1% lines are easily observed (heights are fraction of 9548 cm~! 
line). The EDI curve is robust to fixed-pattern noise such as false peaks in the native spectrum 
at 9174 and 9214cm~+ due to bad pixels at X = 1033 and 1066 (inset shows detector there). 
Bad pixels are constant and thus do not affect whirls, which look at changes between exposures. 
Graphics reproduced from Ref. 16. (See electronic edition for a color version of this figure.) 


4. Spatial Heterodyning Spectroscopy 


Internally dispersed interferometers such as spatial heterodyning spectroscopy2* 79 


(SHS) and the similar heterodyned holographic spectroscopy? (HHS) have the 
great practical advantage of not requiring any moving parts to measure a spectrum. 
Instead of sampling delay space vs. time by scanning a delay, they splay delay space 
spatially across an integrating detector and measure the many delay values at once 
simultaneously, as depicted in Fig. 1(d). 

Figure 20/a) is a schematic of a basic SHS apparatus,?+:°° and (b) shows the 
formation of sinusoidal fringes whose spatial frequency is proportional to the dis- 
tance 60 = 0—9 of a wavenumber a from a base value wavenumber op which is set 
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Fig. 19. Top panel: Large and irregular PSF drifts in the native spectrograph, to which EDI is 
robust. Rows are native spectra vs. etalon #, and hence vs. time (~10 minutes apart). Lower 
three panels: Calculated EDI reaction to a native spectrograph PSF translation using actual 
ThAr moiré data and processing pipeline, but translating every input by 0.5cm7! to the left. 
(a) Red and black curves are unshifted and shifted output spectra respectively, formed from sums 
over (b) unshifted, and (c) shifted wavelet stacks. The spectrograph drifts largely only affect the 
envelope of the wavelets, rather than the phase, and the phase is most critical for determining 
the peak location. The peak centroid shifts 0.025cm~!, hence a factor 20x reduction. A further 
reduction of an order magnitude or more can be obtained by strategically shaping the instrument 
response.?! Graphics reproduced from Ref. 16. (See electronic edition for a color version of this 
figure.) 


by the Littrow angle of the gratings (which is when a diffracted beam retroreflects). 
The gratings at the end of each interferometer arm are configured to retroreflect 
light at wavenumber oo, and for these wavenumbers the wavefronts from the two 
arms are aligned parallel and hence produce a fringe that has a uniform intensity 
across the detector. 

For other wavenumbers, the diffraction from the gratings causes the beams 
arriving at the beamsplitter to have an angle different than oo, creating a sinusoidal 
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Fig. 20. (Left) Schematic of a basic Spatial Heterodyning Spectrometer (SHS). The single initial 
wavefront, shown as a vertical line just to the right of lens Li, is converted by the gratings 
G, and Gg into two tilted wavefronts (labeled 1 and 2 just before lens Lz), where the tilt angle 
depends on the wavenumber, as shown in the right-hand panel. (Right) Explanation of how different 
wavenumbers o will intersect at different angles for the two interferometer arms, due to the gratings 
at the end of each arm which disperse the light. This causes the sinusoidal fringe pattern at the 
detector to have different spatial frequencies, proportional to the change 60 = o—oo9 in wavenumber 
from og. The base wavenumber, og, is the wavenumber at which the wavefronts combine parallel 
to each other, being retroreflected from the grating at the Littrow condition for this wavenumber. 
Since the detected pattern spatial frequency is proportional to 6c, and since multiple spatial 
frequencies corresponding to different wavenumber features can be recorded simultaneously, in 
an additive fashion, then taking the Fourier transform of the detected fringe pattern reveals the 
distribution of dc in the source spectrum. Graphics reprinted from Ref. 24, Copyright (1992), with 
permission from AAS. 


fringe pattern. This angle increases with do. Hence the spatial frequency of the fringe 
across the detector is proportional to da. A Fourier transform is then applied to 
the interferogram to convert it into wavenumber-space, one that is offset from oo. 
Effectively the SHS creates many pathlength differences that are detected simul- 
taneously. Hence in the zoology of instrument types shown in Fig. 1, in (d) we 
notionally show the SHS by a rectangle, to suggest the many delay values being 
sensed simultaneously. 

Figure 21 compares how EDI (middle) and SHS or HHS (lower) interact with 
a simulated absorption spectra having a continuum. This is useful for comparison, 
even if there is actually no external cross-dispersion along the horizontal for the 
SHS in panel (c). (In that case, one mentally collapses the horizontal axis of (c).) 
In some versions of SHS,?° an external cross-disperser is used. 
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Fig. 21. A graphic useful for considering the behavioral difference between the externally dis- 
persed (b, EDI) and a basic internally dispersed behaviors (c, SHS or HHS), even if there is 
actually no external cross-dispersion in panel (c). (One can mentally sum all the external disper- 
sion channels together.) For the hypothetical absorption lines (a), the EDI (b) has an essentially 
uniform period across the whole band, while the SHS has a period that varies reciprocally with 
60 = o — oo, creating a diamond-like shape at a9. This depicts absorption spectroscopy where 
there is a continuum to illuminate the bulk of the comb, rather than emission spectroscopy with a 
dark background. Graphic reprinted from Ref. 7, Copyright (2003), with permission from PASP. 


The key point is that for a single narrow spectral feature in the SHS, which 
makes a vertical line, a sinusoidal pattern is made that has a vertical spatial fre- 
quency that increases linearly with distance do = o — oo from the center of the 
diamond-like structure at 09. Hence the vertical period is reciprocally related to ao 
and it creates the diamond-like shape. For the range o which are close to oo the 
spectral resolution can be extremely high. 

However, for large 60 the spatial frequency can eventually increase to beyond 
the Nyquist frequency in the vertical that the finite detector pixel spacing can 
resolve. This sets the edge of the bandwidth for the SHS. However, in some versions 
of SHS the use of an echelle grating?‘ creates different diffraction orders that are 
separately detected and have separate bandwidth average positions. This allows the 
aggregate bandwidth to be larger. 

The photon noise for absorption spectroscopy (which has a continuum back- 
ground to contribute noise from other wavenumbers) is worse than a dispersive 
spectrograph by the square root of number of overlapping signals that fall on the 
same detector pixel (Eq. A40 of Ref. 30). By the Nyquist theorem, the number of 
distinct spatial frequencies that can be resolved by a row of pixels could be as much 
as half the pixels along the delay dimension. 

Figure 22 (left) shows an interferogram of a Hg source. The spatial frequency 
of each column yields the wavenumber of the associated Hg line. Figure 22 (right) 
shows the spectra obtained by Fourier transform of the interferogram columns. 
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Fig. 22. (Left) Interferogram data of a Hg source. The spatial frequency of each column yields 
the wavenumber of the associated Hg line. (Right) Spectra obtained by Fourier transform of the 
interferogram columns. Graphics reproduced from Ref. 25 with permission from SPIE and author. 


Fig. 23. Monolithic field-widened SHS interferometer for the DASH project that measures atmo- 
spheric wind velocities. The triangular edges of the Koster’s prism beamsplitter are 100 mm long. 
Photo reprinted from Ref. 27, Copyright (2010), with permission from OSA. 


Figure 23 shows a monolithic SHS interferometer for the Doppler asymmetric 
spatial heterodyne (DASH) project,?”25 which measures atmospheric wind veloc- 
ities. The triangular edges of the Koster’s prism beamsplitter are 100mm long. 
Figure 24 shows a comparison of the SHS (DASH) wind velocities to the more 
conventional Fabry—Pérot interferometer (FPI) instrument results, both measuring 
winds from the ground on the same night. The dashed line represents zero wind 
velocity. The HWM07 is a Horizontal Wind Model based on historical data. 
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Fig. 24. Comparison of the SHS (DASH) wind velocities to the more conventional Fabry—Pérot 
interferometer (FPI) instrument results, both measuring winds from the ground on the same night. 
The dashed line represents zero wind velocity. The HWMO07 is a Horizontal Wind Model based on 
historical data. Photo reprinted from Ref. 28, Copyright (2012), with permission from Elsevier. 
(See electronic edition for a color version of this figure.) 


Because SHS (and EDI) do not have moving optical parts to scan a delay, 
they can employ additional optical elements to dramatically increase their etendue 
(product of beam area times solid angle), called field-widening, which makes their 
spectral properties independent of the entering beam angle for a wider range of 
angles. Conventional dispersive spectrographs typically do not employ field widen- 
ing, so their etendue is much smaller. This makes the SHS particularly well suited 
for measuring extended diffuse sources, such as air glow and atmospheric winds 
(Figs. 23 and 24). 

In field-widening, an element such as an etalon, prism, or lens is added to 
superimpose the virtual or real image of the grating or end mirror of one arm of the 
interferometer onto the grating or end mirror of the other arm. Therefore, it appears, 
according to ray angle, that the interferometer has zero delay. But in reality, by time- 
of-flight of a hypothetical pulse of light there is a nonzero delay between the arms due 
to the slower speed of light through glass and a related positional offset. The time-of- 
flight difference between the arms accomplishes the desired interferometric spectral 
behavior, while acting as a zero delay device regarding beam angle. A zero delay 
interferometer has zero delay for all angles. Hence the field-widened interferometer 
imposes the same nonzero time delay for all rays, independent of ray angle (except 
where the small angle approximation of Snell’s law breaks down). 
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Nulling interferometry uses destructive interference to suppress starlight in order 
to enhance the contrast of faint emission sources near the target star. Nulling 
can probe closer to stars than coronagraphy can, thus enabling unique observa- 
tions of exozodiacal dust and faint companions inside the coronagraphic regime. 
Nulling has undergone numerous advances recently, both in optical implementa- 
tion schemes, and in data analysis and calibration approaches, and this chapter 
provides an overview of the theory, techniques and requirements unique to nulling 
interferometry. It concludes with a mention of future possibilities. 


1. Nulling Interferometry vs. Standard Astronomical 
Interferometry 


The goal of nulling interferometry! (or, more simply, “nulling”) is to suppress 
starlight by destructively interfering the light collected by separate telescope aper- 
tures or subapertures. In general, nulling can differ from standard long-baseline 
optical/infrared interferometry in several regards, including the beam combination 
approaches, the fringe measurement, tracking and stabilization methods, and the 
data processing and calibration techniques. The most basic difference arises from 
the fact that in normal astronomical interferometry, the determination of the fringe 
parameters (visibility and phase) is paramount, while the null fringe is typically 
used primarily to suppress starlight, so as to enhance the observability of much 
fainter off-axis emission. However, once the star is nulled, interferometry may or 
may not be used to determine the off-axis source parameters. 

For the rejection of starlight to be deep and stable, an achromatic “null fringe” 
must be kept fixed on the center of the star, which rules out the scanning of the 
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fringe pattern. This means that rather than measuring the fringe visibility, V, i.e. 


Imax =z tmin 
v= 1 
Imax F Imin : ( ) 


where Imax and Imin are the fringe maximum and minimum, respectively, nullers 
usually measure a different but related fringe quantity, the null depth, NV, given by? 


Imin 
N= (2) 


Imax 


The two are related via 


aa onversely, N oe 
= ——., or conversely, = —.. 
1+N’ ” 1+V 


(3) 
The advantage of N is that it directly measures the small quantity desired, i.e. 
the residual light leakage, whereas visibilities close to unity become difficult to 
distinguish from unity. In the small null depth limit, we have 
1-V 
V x 1-—2N, or conversely, N & 5 (4) 
As any light leakage will degrade the depth of the fringe minimum, the measured 
null depth will be astrophysically meaningful only if instrumental leakage terms are 
minimized and/or removed by calibration. 

Finally, we note that, as the nulling of astrophysical sources requires the can- 
cellation of the incident fields to be relatively stable, nulling tends to be easier at 
longer wavelengths. Indeed, the first nulling observations were carried out in the 
microwave regime,® with the first such observations being of the Sun, as was the 
case with coronagraphic observations. The nulling of other stars was first proposed 
for mid-infrared wavelengths,'! because exoplanets and exozodiacal light are both 
expected to be bright in the thermal infrared. A number of ground-based nullers 
aimed at the detection of thermal dust emission at mid-infrared wavelengths have 
now been deployed,* “ and it has also proven possible to extend nulling observations 
to the near-infrared,® !° by taking advantage of extreme adaptive optics systems for 
wavefront stabilization, and of the very much lower thermal background emission 
at those wavelengths. 


2. Two-Beam Nulling 


The simplest case is two-beam nulling. To enable a fringe minimum that is both 
deep and broadband, the two incoming beams must be combined so as to simulta- 
neously cancel the fields at all wavelengths in the observing passband, and in both 
polarization states. For a high degree of cancellation after propagation down the 
respective optical beam trains, the fields at the beam combiner must be extremely 
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well matched, implying both a high degree of symmetry (for matching, e.g. ampli- 
tudes and polarization states) and stability (to keep the relative phase between the 
beams fixed). 

Regardless of the specific beam-combination technique used (see next section), 
a two-beam nuller combines a pair of beams in anti-phase, yielding linear on-sky 
fringes that are spaced in angle, as usual, by \/b, where 4 is the observation wave- 
length and b is the baseline length between the apertures, but with a central achro- 
matic destructive fringe centered on the star (Fig. 1). The on-sky fringe transmission 
at any wavelength is then 


t(01) = ; (1 — cos (=) = sin? (=) ; (5) 


where 0, is the angle on the sky from the central null fringe in the direction perpen- 
dicular to the fringes. At large angles, the response is usually reduced (as illustrated 
in Fig. 1) because of coherence or transmission losses arising from a number of 
9,10 and/or 


factors such as passband averaging, !! focal-plane beam combination, 
spatial filtering.’ 18 

With a nulling baseline that can be rotated around the line of sight, the response 
to a point source located at a radial angular offset from the central star of 6, and 
an azimuth angle of ap is then 


6p Cos(ap ~ a) ) . (6a) 


t(8p,@) = sin? ( x 


where a is azimuth angle of the baseline. Point source response curves for a full 180° 
of baseline rotation for sources at angular radii of 0.25, 0.5, 0.75 and 1.0 times the 
fringe spacing are shown in Fig. 1, where it can be seen that point sources at larger 
radial offsets cause responses containing higher harmonics of the rotation frequency, 
because more fringes are crossed during the rotation. Frequency and phase analysis 
of the signal resulting from baseline rotation can thus be used to determine source 
locations.14~!® However, at small radial offsets, Eq. (6a) reduces to 


(8,0) = (reste a) 8 ea fi tics (2id,=a))), (6d) 


a 2 r 


which has only a single-frequency component at twice the baseline rotation fre- 
quency. Moreover, as the signal due to a point source is proportional to the product 
of its flux, Fp, with t, the small-angle signal is proportional to F8, 
degeneracy at small angles between the source flux and angular offset. 

The response to extended sources is given by the convolution of the “nulled 
brightness distribution” !? (i.e. the product of the source brightness distribution 
with the fringe transmission pattern) with the single-aperture point-source response, 
which reduces to the integral of the product of the nulled brightness distribu- 
tion with the centered single-aperture response in the case that spatial filtering is 
applied.*:>: 12:13:17 Relating measured null depths to circumstellar disk parameters 


implying a 
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Fig. 1. Second row: A cross-cut through the fringe transmission pattern of a monochromatic 
single-baseline nuller (dotted line), and a more localized response (solid line) due to, e.g. reduced 
off-axis coherence or transmission. The null fringe is at the center of the fringe packet. Top row, 
left to right: Response vs. baseline rotation angle to point sources at radial offsets from the central 
star of A/4b\/2b, AA/4b and A/b. (The dash-dotted arrows originate roughly at the largest fringe 
phases reached during the rotation.) Third row: Simulated null-depth data sequences for an astro- 
physical null depth, Na, of 0.01, a root-mean-square phase error of 0.1 radians, and mean phases 
corresponding to null offset leakages of No = 0.00 (left panel) and 0.04 (right panel). (The two 
downward-pointing dashed arrows originate roughly at the fringe setpoints.) Bottom: null-depth 
probability density functions for an astrophysical null of 0.01, a root-mean-square phase error of 
0.1 radians, and null offsets (due to phase setpoint errors) of 0.00, 0.01, 0.02, 0.03 and 0.04. The 
first and last of these curves correspond to the two simulated data sequences shown directly above, 
as indicated by the dashed arrows. 
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then requires modeling both the source structure and its coupling to the nuller’s 
response pattern.!> 18-2 

The simplest case of an extended source is a stellar disk. With a peak trans- 
mission of unity in the monochromatic case, the ideal null depth for a point source 
separated from the star by a small angle is given by Eq. (6b), and integration over 


a uniform stellar disk of small diameter 0g yields a stellar null of?° 


nm (b03\" 
coe ew ke a 


Including limb darkening, one gets!? 


mr () (1-74) 


16 q-4)’ 


where A(A) is the limb darkening coefficient. In either case, it can be seen that 
a small stellar leakage requires a central null fringe much broader than the stel- 
lar diameter. Moreover, because starlight leakage through the null fringe increases 
quadratically toward the stellar rim (Eq. (6b)), null depth measurements are sen- 
sitive to “edge effects”, and so are well suited to measurements of stellar diameters 
and limb darkening, and in principle can enhance limb spectra relative to the overall 
stellar disk. 

As an example, with a baseline of 85m, the erstwhile Keck Interferometer 
Nuller®: 17:18 (KIN) had a fringe spacing of 24 mas at \ = 10um and a theoretical 
stellar null depth of ©10~? on a 1 mas diameter star (the approximate diameter of 
a G star at 10 pc). On the other hand, with a baseline six times shorter (14.4m), 
the Large Binocular Telescope Interferometer” 1”?! (LBTI) provides a 10 jum fringe 
spacing of roughly 140 mas and a stellar null depth of +3 x 10~° on the same type 
of star. 

Applying the same definition of inner working angle (IWA) to a nuller as is 
applied to coronagraphs (i.e. the point at which t(@) = 1/2) gives 


(8) 


X 
IWA= =, (9) 
Combining Eqs. (7) and (9) then yields 
IWA_ 


Os 16V/N,” cae 
showing that the [WA moves outward with improving stellar rejection. With a 
single-baseline nuller, there is thus necessarily a trade-off between the inner working 
angle and angular resolution on the one hand (both proportional to 6~') and the 
stellar leakage (proportional to 67), due to the baseline length. 

One advantage of nulling is that with sufficient accuracy it can enable measure- 
ments to be carried out with baselines much shorter than the baselines typically 
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employed for long-baseline visibility measurements. The case of stellar diameter 
measurements provides a case in point. When using nulling to measure a star’s 
diameter, the accuracy of the diameter measurement is given by!® 


8\? 6N 
ee i 
8 720g b2’ oY 


where ON is the null-depth measurement accuracy. As the accuracy of a given star’s 
diameter measurement is proportional to )N/b?, a more accurate diameter mea- 
surement can be obtained either by increasing the baseline length, or by improving 
the null-depth measurement accuracy. In fact, for a given 50s, we have b x VON, i.e. 
the needed baseline length decreases as the square root of the null depth accuracy. 
This comparison is a bit oversimplified, as it leaves out factors such as the ability 
to vary baseline lengths, but nevertheless, the estimated baseline reduction can 
be quite sizable. For example, improving visibility or null depth accuracies from 
the 10~? level to 10~* implies that baselines can be reduced by roughly an order 
of magnitude, a factor large enough to take the needed baseline lengths from the 
separated-aperture regime (i.e. ~100m) to lengths that can fit within the diam- 
eters of large existing and planned single-aperture telescopes (~5-40m). Indeed, 
high-accuracy near-infrared nulling between a pair of subapertures within the pupil 
of Palomar’s Hale telescope!®:?4 has enabled stellar diameter measurements with a 
baseline of only 3.4m, which is shorter than the length of Michelson’s original stellar 
interferometer.?° With longer nulling baselines across larger telescope pupils, it is 
even possible to measure nearby main-sequence stellar diameters (e.g. a G2 star at 
10 pe would have a diameter-limited null depth of 3 x 107° at A\ = 2um on a 28m 
baseline). 


2.1. Beam Combination 


Considering the completely symmetric case of combining two identical beams with 
zero optical path difference between them at an ideal 50/50 beamsplitter, symmetry 
and energy conservation demand that half the incident light should emerge from 
each side of the beamsplitter. However, as the reflected and transmitted fields that 
are superposed at each of the beamsplitter outputs have equal amplitudes (as a 
result of the assumed 50/50 split ratio), each of the two net output fields can only 
equal the single-beam input amplitudes, as is required by energy conservation, if 
there is a 7/2 phase shift between the reflected and transmitted beams. Thus, sym- 
metry and energy conservation together demand!!>?° 2? that ideal 50/50 beamsplit- 
ters must introduce 7/2 phase shifts between their reflected and transmitted beams. 
As such, fringe minima occur only at nonzero optical path differences, resulting in 
minima that are chromatic. Conversion to an achromatic nuller thus requires the 
addition of an extra 7/2 radian phase shift between the combining beams, since 
broadband cancellation requires an achromatic phase difference of 7 radians. The 
extra 7/2 of phase can be supplied most simply by passage through unbalanced 
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dielectric media*:® (ie. a longitudinal phase shift), or by making use of a second 
beamsplitter pass.?° The first of these is the method used by both the Bracewell 
Infrared Nulling Cryostat® (BLINC) and the LBTI.’ Standard interferometric pupil- 
plane beam combination at a single beam splitter also typically includes an extra 
unbalanced reflection in one of the two beam trains to allow both polarization states 
to be co-phased simultaneously.?” 

On the other hand, it is possible to have complete symmetry between the two 
beam trains if a relative phase shift of 7 radians is supplied upstream of either 
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a dual-pass beamsplitter configuration,?® or a focal-plane combiner 
tion). Several methods of directly introducing an achromatic 7-radian phase shift 
exist,?’ including, e.g. anti-symmetric periscopes, the Gouy through-focus phase, a 
half-period lateral grating translation, orthogonally oriented half wave plates, and 
geometric phase. A succession of polarizers can also be used to rotate polarization 
states into opposition, but at the cost of lower efficiency. 

The beam intensity ratio is also affected by passage through a beamsplitter, 
as beamsplitter reflection and transmission coefficients are in general not equal. 
The fully symmetric nuller employed by the KIN thus produced matched output 
intensities with a symmetric pair of beamsplitter passes in which both beams see the 
common product of the amplitude and reflection coefficients.?* Any of the intrinsic 
m phase shifters listed above can then be inserted ahead of a fully symmetric beam 
combiner to turn it into a nuller by providing the phase shift needed to turn the 
central constructive fringe into a destructive fringe. 

However, all phase shifters have limitations —- some are limited to providing 
only a single fixed value of the phase shift, and many provide chromatic phase 
shifts. Working under a fluctuating atmosphere or behind imperfect optical beam 
trains, the relative phase between two incoming beams will fluctuate randomly, and 
will also very likely have a chromatic character. The former implies that a fixed 
phase shift will be inadequate. Moreover, beam amplitudes are likely to differ after 
propagation down a pair of different beam trains. Active control of amplitude and 
phase (the latter being carried out much more rapidly than the former), as well as 
dispersion correction must then be applied for deep, broadband nulling. Dielectric 
plates or wedges with net thicknesses adjustable by means of rotation or translation, 
respectively, can supply a variable phase shift, and together with an adjustable air 
path, can provide broadband dispersion correction.?! Residual dispersion could in 
principle be addressed by spectrally dispersed nulling — i.e. nulling a number of 
narrow spectral channels individually, as the residual dispersion in each resolution 
element will be lower than across the entire band of interest. The phase offsets from 
null likely to be present in the individual channels can then be dealt with either by 
an adaptive nuller®? that disperses the nulled light onto a deformable mirror that 
corrects each channel’s mean phase, or by the “null self-calibration” data analysis 
technique”4 (Sec. 3.2), which can extract each channel’s true astrophysical null even 
in the presence of phase offsets. 
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2.2. Pupil-Plane vs. Focal-Plane Beam Combination and the Role 
of Optical Fibers 


The previous section was concerned primarily with classical beam combination 
involving free-space optics, in which the pupils of the two beams are superposed 
at a beamsplitter (Michelson interferometry). This approach is also referred to as 
coaxial beam combination,” because the two beam axes coincide after beam com- 
bination. When nulling in the Michelson configuration, energy conservation implies 
that when one of the beamsplitter outputs is nulled, the other is bright. The bulk 
of the starlight is thus well-separated from the nulled output. On the other hand, 
two beams can instead be combined in the focal plane (Fizeau combination). This 
case is also referred to as multi-axial combination,?? because the beam axes don’t 
coincide. In this case, all of the starlight reaches the focal plane, with the null 
fringe in the center of the fringe pattern,?° and there are no separate nulling and 
constructive outputs. The bulk of the starlight thus remains in close proximity to 
the dark region. 

Single-mode optical fibers can play important roles in both beam combination 
schemes, in the high Strehl ratic* regime. In coaxial combiners, an optical fiber 
located in a focal plane after beam combination can be used to filter out pupil-plane 
wavefront irregularities, and thus improve null depths.*° In particular, the presence 
of differing spatial aberrations in the two beams produces differently-aberrated stel- 
lar point spread functions, leading to off-axis light leakage in the combined beam. 
Very close to the star, most important are pointing mismatches and low order wave- 
front errors. However, the only phase term that can propagate within a single-mode 
fiber is the piston phase difference between beams; all other pupil-phase errors are 
filtered out. Even tip—tilt-related phase errors don’t propagate in the fiber; they are 
instead converted to amplitude errors by the dependence of fiber coupling on angle 
of arrival. A single-mode fiber coupled to the core of the point spread function (PSF) 
can thus improve stellar rejection considerably,?? but over a field of view restricted 
to the PSF core. In contrast, without a fiber, the available field of view is larger, 
but the null depths are likely to be more modest. 

In the Fizeau configuration, the fringes across the focal-plane Airy disk imply 
that fine sampling would be needed to isolate the deepest part of the central null 
fringe, if no spatial filtering were applied. A single-mode fiber coupled to the focal- 
plane Airy disk can thus play a critical role in the Fizeau case as well. First, a 
single-mode fiber can itself function as a beam-combiner, as a pair of focused beams 
can be coupled into the same fiber mode if both arrive within the fiber’s acceptance 
cone.?”:3° Moreover, by including an upstream relative phase shift of 7 between the 
beams to be combined, the fiber-combiner becomes a nuller, because the resultant 
anti-phased pair of stellar electric fields produces an anti-symmetric focal-plane field 
distribution on the fiber input plane that cannot couple to the single-mode fiber’s 


*See Chapter 13 of Volume 2 of this Handbook. 
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symmetric propagation mode.?*:3° However, while the on-axis starlight cannot cou- 
ple to the fiber, off-axis emission arriving with a different phase shift can. Because 
of its relative simplicity, this type of “fiber nuller” was employed by the Palomar 
Fiber Nuller (PFN).? 1°16 


3. Instrumental Limitations 


Thus far, only the case of an ideal, error-free, two-beam nuller has been discussed. 
However, in any real interferometer, a number of different types of imperfection can 
lead to an increased level of light leakage. Since the electric fields at the outputs of 
the two beam trains must cancel to high accuracy, i.e. EB, + Exe’’ = 0, the fields 
must be matched in amplitude, phase, polarization rotation angle, retardance, and 
dispersion. For the case of monochromatic, single-polarization light, it can be shown 
that leakages linear in amplitude and phase errors vanish at null, leaving only smaller 
quadratic leakage terms.'® This minimum-noise condition is the reason for operating 
at null, although it does not apply if thermal background noise dominates. To avoid 
increased noise from other fringe phases, only intensities from the null phase can 
then be used. As a result, the null is best calibrated using individual beam intensities 
rather than data from other fringe phases. 

In the Michelson case, the null depth integrated over the combined beam pupil 
is given by 


>. 


N = Na(6) + Ns(b) + Ni, (12) 


where the first term is the desired astrophysical signal due to nonstellar off-axis 
emission, the second is due to stellar leakage, and the third is due to any and all 
instrumental leakage terms, given in short by!® 
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which is one quarter of the sum of the variances due to each of the possible leakage 
sources. In more detail, in an integration time, t, and in the absence of beam shear, 


the instrumental null is given by! 2% 


Lye 7 = = = _ 
Ni= 7 (B+ H+ G+ +o +H), (14) 


where the different terms give, in order, the spatial variance of the wavefront phase 
difference across the beam apertures, the temporal variance of the average phase 
difference within an integration time, the spectral variance of the dispersion across 
the passband, the variance of the retardance between the two polarization states, 
the variance of the residual polarization rotation angle between the beams, and the 
variance of the amplitude imbalance. Note that each of these variances is taken over 
a different variable, and the timescales involved can also be quite different, as some 
terms are relatively stable, while others can vary rapidly. 
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3.1. Phase Errors 


Due to fluctuations in any of the quantities contributing to the instrumental null 
depth (Eq. (14)), measured null depths will also fluctuate. And as the dominant error 
term is often phase fluctuations, accurate astrophysical null depth measurements 
will require stabilization of the relative phase between beams so as to stay at or near 
the bottom of the null fringe. However, because all fluctuations from the bottom of 
the null fringe lead to positive-definite increases in the null depth (Eqs. (12) and 
(14)), any time-average of a single-baseline null-depth measurement sequence will 
necessarily be biased upward, and thus would provide an inaccurate estimate of 
the astrophysical null depth, N,. Indeed, in the simplest case of only phase errors 
about the fringe minimum, a better estimate for N, would instead simply be the 
minimum null depth present in a sequence, as can be seen in the left-hand simulated 
null-depth measurement sequence in Fig. 1. 

This can be seen in more detail by considering Gaussian phase noise, and 
neglecting all other error sources. In this case, the measured null, N,,, at any time 
is given by 


2 
Nn = Na+ N= Na+ ($) ’ (15) 
where N; = (¢/2)? is the quadratic instrumental phase-error contribution to the 
measured null depth. In this case, the mean null depth over a measurement sequence 
is given by!® 
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where ¢ is the mean phase offset from the perfect null phase of 7 radians (due to, e.g. 
an experimental setpoint error), and oy is the root-mean-square phase fluctuation 
about the mean phase. As Eq. (16) shows, the average measured two-beam null 
depth is always larger than the true astrophysical null depth because of two factors: 
the mean phase offset from null, and the variance of the phase fluctuations. 

If the two positive bias terms in Eq. (16) were known or measured, they could 
be subtracted from the measured mean null depth to retrieve the astrophysical 
null. However, determining the two bias terms is nontrivial, especially as they are 
specified in terms of phase, rather than null depth, which depends on the square of 
the phase. However, after some manipulation,!® the last equation can be recast in 
terms of the root-mean-square null-depth fluctuation, oy, as 


Nin = 1Va —— a 17 
; (17) 


using 
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Finally, inverting Eq. (17) gives the astrophysical null depth as 


_ /o4 + 80%, 
N, = Nm — ——————-.. (19a) 
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In the absence of a mean phase offset error (i.e. for a perfect phase setpoint at the 
bottom of the null fringe), Eq. (19a) simplifies to 


Na = Na - —: (19b) 


In this case, both of the quantities on the right-hand side are directly calculable from 
a given null depth measurement sequence, as a result of which the astrophysical null 
can be determined directly from the data. On the other hand, in the opposite case 
of a phase offset but no fluctuations, Eq. (19a) simplifies to 
= ¢ 

Nas Nae a (19c) 
From Eqs. (19a) and (19c), it is clear that solving for the astrophysical null also 
requires accurate knowledge of the mean phase offset between beams during the 
measurement sequence. In the ideal case, the mean phase offset could be removed 
by observing, under identical conditions, a calibrator star with no astrophysical 
contribution to the null depth. However, in practice the mean phase offset is unlikely 
to remain unaltered from star to star. 

Before discussing how to determine the mean phase offset between the beams, it 
is important to note that Eq. (19a) (and the simpler Eq. (19b)) links the astrophys- 
ical null depth to the variance of the null-depth fluctuations. In other words, in the 
absence of perfect stabilization at the bottom of the null fringe, the statistics of the 
null-depth fluctuations become integral to the determination of the astrophysical 
null depth. This is because of the underlying nonlinear fringe shape, which makes 
the character of the null depth fluctuations a function of the mean phase offset 
between the two beams. This is illustrated by the pair of simulated null depth data 
sequences shown in Fig. 1, where one can see that all null-depth fluctuations are 
necessarily positive when starting from the bottom of the null fringe, while for other 
mean phase offsets, the null depth can fluctuate in either direction. Indeed, it is the 
mean-phase-dependent character of the null-depth fluctuations that allows one to 
distinguish between a true astrophysical null-leakage signal and the null-depth offset 
caused by a mean-phase error (see Sec. 3.2). 

However, what is the expected level of null depth fluctuations? In the case of a 
phase setpoint exactly at null, converting the variance of the phase fluctuations in 
Eq. (18) to the variance of the optical path difference, oz, gives 


on = v3 (™E)". (20) 


Whether pathlengths between separated apertures are stabilized by a fringe tracker, 
or pathlengths between subapertures within a common telescope pupil by an 
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adaptive optics system, o, ~ 100 nm can be chosen as representative, for which 


oN~ =, (21) 


where A,,m is the observing wavelength in microns. From the point of view of phase, 
long wavelengths are thus clearly easier to null, as typical rms null depths (without 
further phase-stabilization steps) are predicted to be ~0.03 at K-band (2.2 zm), and 
~1073 at N-band (10 ym). Note that the first value is roughly consistent with typical 
K-band visibilities of a few percent. Even deeper mean instrumental null depths thus 
require even finer phase control, such as phase-averaging over apertures larger than 
those of the wavefront sensor, or using shorter-wavelength fringe information to 
stabilize longer-wavelength fringes.3+ 3° 

However, the mean null depth and the variance of the null are not the full story, 
as one could presumably use only the deepest null depths in a sequence to delimit 
faint underlying astrophysical signals (i.e. “lucky” nulling). One must thus ask how 
often instrumental nulls of a given depth are expected to occur, and how much 
deeper than the mean null it is possible to probe effectively. For Gaussian phase 
fluctuations centered at the optimal null phase, the probability, p, of the null depth 
being below a given level N, at any time is given by the error function 


p(N < Nz) = Erf ( ~~) ; (22) 
oO 

For the same 100nm rms phase error as before, K-band nulls of 107', 1077, 1073 
and 10~* should thus be seen 97%, 50%, 17% and 5% of the time, respectively, 
implying that nulls even two orders of magnitude deeper than the average null are 
present on the order of 10% of the time. At N-band, the situation is again much 
more favorable, with phase fluctuations alone allowing 10~4 and 10~° nulls 26% and 
8% of the time, respectively. Inclusion of other error terms, and of thermal infrared 
background noise will of course further limit performance. 


3.2. The Null-Depth Probability Density Function 


As null depths significantly deeper than the average null should thus appear reg- 
ularly, the final step is to examine the expected frequency distribution of a set of 
null-depth measurements. In the simple case where only phase errors are present, 
the probability that a given null-depth measurement falls within a small range dN 
is given by 


P(N)AN = 5° p(y)dy (23) 


where the summation is over the two phases of opposite sign that yield identical null 
depths. As this equation translates the phase-fluctuation probability density func- 
tion into the null-depth probability density function, this process can be inverted: 
i.e. with a measured null-depth distribution, and either an assumed or a measured 
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phase distribution function, the astrophysical null N, can be extracted by fitting 
the observed null-depth probability distribution.24:°°°" This procedure has been 
referred to as the “null self-calibration” (NSC) technique, because the nulling data 
stream (together with measurements of the individual beam intensities and the 
dark level) is itself used to extract the calibrated astrophysical null depth. For a 
quadratic relationship between phase and null-depth as in Eq. (15), and Gaussian 
phase fluctuations, Eq. (23) implies a null-depth probability density function of 


2 —2(N+No—Na) 
cosh = a cal — “) (24) 
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where N, is the null-depth offset corresponding to the mean-phase offset error. This 
one-sided function (Fig. 1) is nonzero only for N > Nq,, and initially drops sharply 
from an infinite asymptote at N = N, to higher values of N. This distinctively 
asymmetric function can be fitted to high accuracy to retrieve N, as well as the 
mean null-depth offset N,, and the phase variance a3. In the absence of a mean- 
phase offset, Eq. (24) reduces to 


p(N) = WWoMDa exp (—Y). (25) 


which, as can be seen in Fig. 1, is a one-sided exponential-like function that 
approaches infinity at N = Ng. 

The opposite case of small null-depth fluctuations, 6N, about a much larger 
null-depth offset, No, is revealing. In particular, for DN < N,, one can show that 


p(ON) = (26) 
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where 6N = N—N,— No. Moreover, in this limit, Eq. (18) reduces to 0%, = Nets 
so Eq. (26) becomes 
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which is the product of a standard Gaussian probability distribution (with a center 
at Na+ N, and a variance of oy = Noo4); with an exponential factor close to unity 
that introduces skewness: because the exponent in the latter factor changes sign 
with dN, the resultant probability density function is slightly asymmetric about its 
center, with positive excursions of a given magnitude being slightly more probable 
than negative excursions of the same size. This is because the quadratic dependence 
of null depth on phase near the fringe minimum means that adding a given phase 
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fluctuation to the mean phase offset will alter the null depth by a larger amount 
than subtracting the same phase fluctuation would. Indeed, the nonlinear relation- 
ship between the null depth and fringe phase makes it possible to determine all 
of the necessary parameters — the astrophysical null depth, N,, the variance of 
the null depth fluctuations, o3,, and the static null depth offset, No, that arises 
from the mean phase offset — by fitting measured probability density distributions 
to Eq. (27). This would not be possible without the skewness factor present in 
Eq. (27), as a symmetric Gaussian function can be completely described by only 
two parameters — its center and width — and in this case, the center location 
depends only on the sum N, + N,. However, the skewness factor in Eq. (27) has no 
dependence on N,, and so breaks this degeneracy. 

Interestingly, the underlying fringe nonlinearity allows the retrieval of the astro- 
physical null depth even for null-depth measurement sequences that are offset sig- 
nificantly above the actual astrophysical null (e.g. the right-hand simulated data 
sequence of Fig. 1). In this “off-null” case, the null-depth fluctuations are amplified 
by the increasing fringe slope off the null (note that the second term in Eq. (18) 
multiplies the phase fluctuation variance by the square of the fringe slope at the 
mean phase offset), leading to a broadening of the probability density distribu- 
tion for larger mean-phase offsets (Fig. 1). Of course, once the mean null-depth 
offset is determined by a first observation sequence, it can be removed by apply- 
ing the appropriate phase shift to bring the interferometer to null prior to further 
observations. 

Returning now to the general case of Eq. (24), the shape of the probability 
density distribution can be much more asymmetric (Fig. 1) than the limiting case 
of Eq. (27), but the basic conclusions regarding the extraction of parameters still 
apply. Indeed, the NSC fitting technique is even more robust than suggested by this 
brief discussion, as use of the full NSC technique?*?” allows the retrieval of all of 
the dominant error terms, including, e.g. both amplitude and phase errors. As a 
result, the NSC algorithm has become the standard data reduction technique for 
both the PFN and the LBTT nullers. 

Finally, note that the NSC procedure for finding the true astrophysical null 
is akin to the coronagraphic “dark speckle” technique®® for exoplanet detection, 
in which speckle fluctuation minima are sought. Indeed, as the flux at any image 
point is determined by the interferometric combination of the beamlets from each 
of the AO system’s deformable mirror elements, the dark speckle technique is based 
on multi-beam interferometry rather than two-beam interferometry. The null self- 
calibration technique and the dark speckle technique can thus be viewed as members 
of a family of related flux measurement techniques. 


4. Multi-baseline Nulling 


Of course, a single-baseline nuller has limitations, including a rather slow rise in 
transmission from the central minimum. Moreover, in the thermal infrared regime, 
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two bright signals need to be removed — the stellar flux and the thermal background 
(including zodiacal emission in the case of space missions) — but statically nulling 
a star does not remove the incoherent background. To separate off-axis companions 
and circumstellar dust emission from the background, some type of signal mod- 
ulation is required, such as synchronized spatial chopping (as at the LBTI‘”%"), 
or phase modulation between nullers??4° (as the KIN used!?:!%). However, the 
suppression of starlight requires keeping a given nuller’s phase fixed at null, which 
excludes the use of rapid phase modulation within a single-baseline nuller. These 
issues can all be addressed with interferometer configurations that make use of a 
larger number of input beams and baselines. Adding baselines can bring several 
improvements, including changing the shapes of the central null and the surround- 
ing fringe pattern to provide a more rapid angular transition between the regions 
of deep starlight suppression and high off-axis transmission, decoupling the nulling 
parameters from the angular resolution to allow higher resolution observations of 
the residual light, and enabling rapid phase-modulation capabilities to separate the 
different signal types. 

Because different baseline lengths correspond to different fringe frequencies, 
the incoming fields from multiple nulling baselines can be combined to generate a 
fringe pattern that has a broader central null and a more rapid transition to higher 
off-axis transmission.'4 In particular, higher fringe frequencies can be added with 
amplitudes that lead to cancellation of the lowest-order terms in the expansion of 
the fringe transmission vs. off-axis angle, yielding higher order nulls with transmis- 
sions at small angles proportional to 6* or 0° instead of the basic 6? null provided 
by a single baseline nuller.!44! More-capable linear nulling arrays based on the 
use of more than two telescopes are thus possible, but for the space-based case, 
more telescopes implies additional complexity and a higher cost. Moreover, higher- 
order nulling still requires fixed phases between the telescopes involved, thus again 
excluding phase modulation. Finally, some configurations also assume different field 
strengths for some of the combining beams, which implies either differently-sized 
collecting apertures, or a more complex, and potentially also inefficient, beam com- 
biner. On the other hand, circular nulling arrays,4? which can be implemented with 
relative phases between telescopes of multiples of 27/n, where n is the number of 
collecting telescopes, can make use of equal amplitudes, and can also provide equal 
pathlengths to a central beam combiner. The simplest of these is the three-element 
array.43:44 However, for a given number of telescopes, it has been shown that linear 
configurations can provide the highest-order nulls.*° Many of these multi-aperture 
nulling-interferometer configurations were proposed as configurations for potential 
space-based nulling missions such as the Darwin interferometer*® and the Terrestrial 
Planet Finder Interferometer.*7 

Of course, not all of the baselines need to be involved in nulling the star, and dif- 
ferent baselines can in fact play different roles, such as nulling on one set of baselines, 
and imaging the residual light with another set. In particular, shorter baselines can 
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be used to provide deep stellar nulls, while longer baselines can be used to provide 
high resolution. Combining long and short baselines in a multi-element nulling array 
can thus provide both desired attributes, and also a phase-modulation capability, 
if an adjustable phase shift between different nulling baselines is provided.?% 4° 4% 
In the case of phase modulation between a pair of nulling baselines, the dominant 
noise terms shift to those due to correlated errors.!*:!8>48:49 A dual-nuller approach 
was used by the four-subaperture KIN,® +713 although with the roles of the long 
and short baselines reversed. 

Although higher order nulling has yet to be implemented on the sky, it should 
be possible in the near future, as a wide variety of nulling configurations can be 
implemented in straightforward fashion within the pupil of a large ground-based 
telescope. Indeed, the upcoming 30 m-class telescopes are large enough that mul- 
tiple subapertures can be laid out within their pupils to create essentially any 
nulling-array configuration desired.'® In particular, different subapertures within a 
given telescope pupil can be arranged to provide different baseline lengths, different 
baseline orientations, differing field amplitudes from differently-sized subapertures, 
simulated baseline rotation, and phase-shifting between different baselines. As only 
one telescope is involved, the complexity then resides entirely in the beam combiner. 
However, multi-axial Fizeau combiners can simultaneously combine more than two 
beams,”? thus potentially keeping beam combiners manageable as well. 


5. Future Possibilities 


To date, nulling interferometers have been implemented using both telescope sub- 
apertures and separate telescopes, and at both near-infrared and mid-infrared wave- 
lengths. Based on the experience gained with these systems, it is worth asking 
what the future may hold for nulling. First, as has already been discussed, nulling 
behind an extreme adaptive optics system is very advantageous, as the AO system 
operates as the fringe tracker, considerably simplifying the nuller’s optical system. 
Fizeau combination can also allow a simple fiber-based beam-combiner for multi- 
aperture systems. However, a further simplification is possible behind large tele- 
scopes: rather than building a specialized nulling beam-combiner, one can instead 
potentially implement a nulling mode within an already existing high-contrast coro- 
nagraph. As coronagraphs typically include internal focal and pupil planes where 
coronagraphic masks are inserted, the coronagraphic masks in these planes could 
simply be replaced by nulling masks. For example, a pair of dielectric phase plates 
with a relative 7-phase shift between them, or a pair of “crossed” half wave plates 
that rotate electric fields into opposition could be inserted into a pupil plane.?” 
On the other hand, a phase grating that combines the +1 and —1 orders from the 
opposite sides of a telescope aperture could be inserted into a focal plane.°° In such a 
scenario, a separate nulling-interferometer optical bench is not required; instead, by 
inserting appropriate nulling masks into the coronagraphic focal and pupil planes, 
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nulling can become one of a number of observational modes provided by a combined 
high-contrast coronograph/nuller. 

This idea is particularly promising in regards to planned 30 m-class telescopes, 
since it can extend the high contrast observational regime of such large telescopes 
further inward, to an TWA approaching \/4D, well inside the typical coronagraphic 
IWA of ~1 — 3A/D. Nullers can thus be used to investigate both exoplanets and 
dust very close to stars, including the nature of the near-infrared interferometric 
visibility deficit seen around several nearby stars that has been attributed to hot 
inner dust,°! and to search for long-term radial velocity trend candidates®” inside 
the coronagraphic regime. Moreover, as an IWA of ~A/4D is on the order of a 
few milli-arcseconds in the near-infrared for 30-40m telescopes, direct observation 
of hot Jupiters also becomes possible. However, such small JWAs on long baselines 
would imply significant stellar leakage, thus potentially calling for multi-subaperture 
nulling. As mentioned earlier, multi-subaperture nulling configurations should be 
straightforward to implement on very large telescopes. Of course, large apertures 
in a ring-like configuration are a natural match to the large Giant Magellan Tele- 
scope subapertures.°? Longer wavelengths also favor larger subapertures from the 
viewpoint of signal-to-noise ratio. On the other hand, filled-aperture telescopes pro- 
vide more configuration flexibility, even allowing for the use of differently-sized 
subapertures. By taking advantage of planned instrumentation, such as large tele- 
scopes, extreme adaptive optics systems, and high-contrast coronagraphic benches, 
very cost-effective nulling interferometry on large telescopes may thus soon become 
feasible. 


6. Summary 


While nulling interferometry is still a relatively new field, new optical techniques and 
data-analysis algorithms have enabled optical simplifications, stability relaxation, 
and substantial improvement in measurement accuracies. 10: 1?) 16.17 22.37 This brief 
chapter could not address all of the issues related to nulling, and so has focused on 
the basics as much as possible. Many other topics and references have thus been 
omitted, such as the potential use of integrated optics,°* as well as the possibility 
of combining nulling with closure phase.®° 

Although implementing a nulling interferometer has heretofore tended to be 
rather involved, integrating nulling optics into existing and/or planned high contrast 
coronagraphs on large telescopes should allow considerable simplification, poten- 
tially converting nulling into an additional coronagraphic observing mode, thereby 
moving it more into the mainstream. Nullers also have significant observational 
potential, as a nuller on a 30 m-class telescope should be able to make observa- 
tions not only of hot and/or warm dust very close to nearby stars, but also of 
close companions, such as massive long-term radial-velocity trend candidates inside 
the coronagraphic regime, and the innermost known hot Jupiters. Nulling on large 
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telescopes can also provide stellar observations such as main-sequence stellar diam- 
eters and limb-enhanced spectroscopy. Thus, while space-based nullers capable of 
providing mid-infrared exoplanet spectra remain in the future, ground-based nullers 
continue to progress. 
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Occultations of background stars by the Moon provide a method to obtain angular 
resolution at the milliarcsecond level even with telescopes of modest size at wave- 
lengths ranging from the visual to the infrared. Although affected by significant 
limitations in the choice of sources and in flexibility of observation, the simplicity 
of observation and data analysis has made lunar occultations very successful, 
especially in measuring stellar angular diameters, small separation binaries, and 
circumstellar emission. We provide an overview of the method, and review the 
optical properties of the phenomenon, the data analysis, and the required instru- 
mentation. We also discuss another class of occultation phenomena, namely those 
involving distant solar objects. These are primarily used to study not the occulted 
star, but rather the size and geometry of the intervening occulter, such as an aster- 
oid or a trans-Neptunian object. We finally take a brief look at the fascinating 
future possibility of using artificial occulting screens in space. 


1. Rationale of Lunar Occultations 


Occultations loosely encompass a number of celestial phenomena which involve the 
light of a star, planet, solar system body or other background object being blocked 
by a foreground object, typically the Moon or another solar system body. In fact, 
the correct term is syzygy, from the Greek atGuyoc, which is used to describe 
the alignment of three bodies — the third one in this case being the observer on 
Earth. The best known among these phenomena are lunar occultations, in which 
the Moon blocks the light of a background star. When the occulted object is a 
very bright star, or even a planet, then lunar occultations (LO) can be observed 
by the naked eye, and we have reports of such phenomena dating already from the 
times of ancient civilizations such as the Assyrians and the Babylonians. Aristotle 
wrote about an occultation of Mars by the Moon in 357 BC, and concluded that 
Mars was the more distant of the two. Until relatively modern times however these 
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events were essentially instantaneous phenomena of little consequence except than 
for being annotated in the local chronicles. Later, with the development of clocks 
and of celestial mechanics, they became a useful tool for comparing the accuracy 
of the predictions against observations and thus for refining lunar and planetary 
orbital elements. 

The modern LO history begins at the turn of the last century, when McMahon! 
first suggested a method to resolve in time the light of the star being occulted. He 
reasoned that if a star had an angular diameter y and the Moon had an apparent rate 
of motion V, then the occultation would occur over a time At = y/V and measuring 
At would then lead to a direct estimation of y. It is noteworthy that at the time no 
stellar diameters had ever been measured directly. McMahon correctly estimated 
that the time resolution required was of order 1 ms, and suggested use of a rapidly 
revolving photographic film, or a fast kinematograph, which was being developed in 
those years. Immediately, McMahon’s call for an experimental attempt was rebuked 
by Eddington.” Correctly but perhaps unfortunately, the latter pointed out how the 
light curves of occulted stars would be dominated by diffraction effects, which if 
interpreted as McMahon had suggested would lead to fictitious angular sizes. Both 
McMahon and Eddington failed to see that there was a way to correctly derive 
the angular diameter even in the diffraction regime. Had someone attempted the 
suggested measurement, some progress would have undoubtedly have been made. In 
practice, the authoritative comments by Eddington, and not the least the occurrence 
of a World War, put a serious damper on further efforts. The credit for the first direct 
stellar angular diameter was then seized by Michelson with his stellar interferometer. 

It was in 1939 that LO received a new boost: again in the same journal volume, 
Williams? and Whitford+ published two independent articles, the first of which 
showed how the amplitude of the diffraction fringes was the key to determine the 
stellar angular size, and the second reported the very first two occultations ever 
recorded, using a cathodic tube and a fast rotating photographic film. Fast-forward 
another World War, and the first modern measurements started with the occultation 
of a Sco by Evans® in 1950. About a decade later, the first radio occultations were 
recorded, including an occultation of 3C 273° that pinpointed with great accuracy 
the position of the radio source and led to the discovery of quasars. The improvement 
and the availability of fast photometers led to a boom in LO measurements at visual 
wavelengths in the 1970s. Soon after, the introduction of fast InSb detectors brought 
on a wealth of measurements in the near-infrared as well. By 1987, 348 angular 
diameter measurements obtained by LO were listed’ for 124 stars, a volume of data 
which established LO as the most prolific method in this area at the time. 

At the same time, LO were being used extensively for routine observations of 
field stars, leading to serendipitous discovery of close binary components. The most 
intense LO program was that started by D.S. Evans and collaborators, mainly at the 
McDonald observatory: over 6000 LO events were recorded, with hundreds of new 
binary stars discovered. Another direction of research which greatly benefitted from 
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LO was that of multiplicity among young stellar objects, which led to the somewhat 
surprising conclusion that at least in the Taurus and Ophiuchus star-forming regions 
the binary fraction was much higher than expected.® It is finally worth mentioning 
that in addition to simple angular diameters and binary stars, the LO technique has 
been increasingly applied to the study of objects with complex morphologies (such 
as IRC+10216 and the Galactic Center), thanks to the possibility to reconstruct 
arbitrary brightness profiles. 

In the last two decades or so, other techniques such as adaptive optics and long- 
baseline interferometry have become progressively more mature and productive in 
areas where LO used to be the preferred, or sometimes the only, possible method. 
Besides, LO are limited in the choice of sources and have fixed time constraints. 
Consequently, their appeal has partly faded, but they continue to play a significant 
role thanks to various factors. Firstly, LO are relatively simple both in the required 
instrumentation and especially in the data analysis. Secondly, they are generally 
more sensitive than, e.g. interferometry, and are suitable to study sources with arbi- 
trarily complex geometries. Last but perhaps not least, a number of improvements 
in the analysis of LO data have made the results more reliable and standardized in 
terms of errors than was the case in the 1970-1980s. A recent example of a large LO 
program is the routine use of “filler” observations implemented at the ESO VLT 
using the burst mode of the ISAAC infrared imager in the 2006-2012 period, leading 
to over 1000 recorded lightcurves® !° with a large number of results published or 
still being mined in the area of small separation binaries and extincted infrared 
sources. Other fast visual and infrared instruments are now available at other sites 
and are producing LO results. 


2. Mathematical Description 


Different regimes can be distinguished, depending on the distance from the observer 
to the occulter (Moon, or other solar system body) and on the angular extent of 
the occulted target. We will discuss in detail mainly the LO case. 

We discuss here two regimes: Case A, when the occulted source has an angular 
diameter sufficiently large that diffraction can be neglected; and Case B, when 
Fresnel diffraction is applicable. In both cases, the standard description assumes 
that the lunar limb is an infinite, opaque straight edge, thereby neglecting effects 
due to the curvature of the limb and to its jagged profile (lunar mountains). The 
magnitude of these effects will be discussed later. 

Case A is relatively rare, but rather simple. Denoting wavelength with \ and 
the distance from the observer to the Moon with D, if a source has an angular size 
yp > /A/D, then geometrical optics is sufficient to describe the resulting pattern. 
This is the previously mentioned assumption by Ref. 1, which however requires 
angular sizes larger than about 10 milliarcseconds (mas). Very few stars fall into 
this category and are subject to occultations. In this case, the brightness profile can 
be recovered by simple differentiation of the light curve. We refer to Fig. 1, where we 
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Fig. 1. Brightness distribution of a generic extended source subject to a lunar occultation. x2(t) 
represents the position of the leading edge of the source as it crosses the fixed lunar limb, while 
x1(t) represents its trailing edge. The arrow shows the relative motion of the source towards the 
lunar limb. 


have drawn the brightness distribution of an arbitrarily complex, extended source. 
The «x-axis is aligned along the direction of relative motion of source and lunar 
limb. The figure is in the reference frame of the lunar limb, which is located at 
x = 0, while the complex source moves from left to right. The rightmost point on 
the source has position x2(t) and is the first to cross the lunar limb, at time t, so 
that x2(t1) = 0. The position of the leftmost part of the source, at position 21(t), 
is the last to cross the lunar limb, doing so at time tg, so that x;(t2) = 0. 

The measured brightness distribution has a total intensity Jo (integrated over 
the spectral response of the detection system) and a brightness profile G(z), 
defined as 


Io = f ae f ayt(x,y): G(z) -{~ I(x, y)dy, (1) 


—co 


respectively. Then according to geometrical optics the light curve F(t) becomes: 


lo, t<t, 
0 ee) 
r= to | a | dyI(a,y), ti <t< te, (2) 
x1(t) —oo 
0, eS hb. 
Defining the apparent angular speed as v, = —da/dt we can thus recover the 
brightness profile: 
1 dI(t) 


Cag) = =— (3) 
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Case B describes the majority of LO events. It is mathematically hard to solve 
in general, but it has an analytical solution in the ideal case of the occultation by 
an infinite straight edge of a monochromatic (wavelength ) point-source observed 
with a telescope of diameter much smaller than the diffraction fringes: 


rio) =3{ [Eco] +[F+sco] } (4) 


where the terms 


C(w) = [ cos (<2) dz; S(w) = [ sin (52) dz (5) 


are the so-called Fresnel integrals. The dimensionless term w is the so-called Fresnel 
number, 


w(t) = \/ aye -20), (6) 


measured as a function of distance from the origin x9. This solution can be extended 
to the realistic case of a source having a finite brightness distribution G(y), where 
yy is the source diameter, a lunar limb moving with linear speed Vz, and other 
practical observing conditions by the following equations: 


I(t) = / F(w(t))G(e)ag, (7) 


+$ +4 r2 0 
I(t) = i de / 4 | dn / rE w)GO)AQNT) + 3, 


ee 
2 


(8) 


w(t) = Vox we ViUs De, (9) 


Here G(y) is the source brightness profile, which we assume is nonzero only over 
the range [-§, +4]; O(a) = O, = (a, y) is the projection along the perpendicular 
to the lunar limb of the telescope primary mirror mask function O(a, y), which we 


assume has a maximum extent A; A(X) is the total spectral response, including 
the source spectral energy distribution, the optics transmission, and the detector 
response — we assume that this function is nonzero only over the range [-4, +4); 
and T(r) is the response of the acquisition system (including detector and associated 
electronics) to an instantaneous impulse signal. We assume that at time t, the only 
significative contributions to the recorded signal are generated between (t — At) 
and t. 

Typical diffraction curves generated during a LO events are shown in Fig. 2, 
illustrating the changes due to angular size, to wavelength, to bandpass, and the 
case of binary stars. 

The geometry of an LO event is schematically represented in Fig. 3, in which 
the source is occulted by the Moon along the path from the disappearance at Sp 
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Fig. 2. Simulated LO light curves for an ideal point-like source and a small telescope in a typical 
disappearance geometry. The origin of the x-axis is at the time of the geometrical occultation. 
Left panel, from top to bottom: monochromatic filter at 2.2m; broad K-band filter; broad 
g-band filter. Right panel, from top to bottom (all in R-band filter): uniform disk of 2 mas; 
uniform disk of 6 mas; two point-like sources with a 1:1 brightness ratio and separations of 10 and 
30 mas, respectively. 


Fig. 3. Scheme of an LO event (adapted from Ref. 11). The source moves across the Moon from 
the disappearance position, Sp, to the reappearance position, SR. 


to the reappearance at Sz. The lunar rate of motion (which varies continuously 
but can be assumed to be constant during one event) is Vjz. The actual rate with 
which the lunar limb moves across the source is Vz = Vig cos CA, where CA is the 
contact angle. For small CA values, Vz, is typically about 0.8m ms“, or 0.004”s~1; 
however as CA increases towards +90° (so-called grazing events), the apparent 
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rate of motion of the lunar limb becomes progressively slower. Another important 
quantity is the position angle PA, which is the apparent direction of motion of the 
limb over the source and hence the direction along which the brightness profile is 
scanned by the limb. Either disappearances or reappearances can be observed, for 
the latter however the Vz values are negative. At visual and near-IR wavelengths, 
observations can be observed on the dark limb only, and this means usually only 
one of the Sp or Sr points — cases in which both fall on the dark limb occur 
but are very rare. The bright limb can also be observed in case of mid-IR or radio 
wavelengths. 

As a first approximation, the lunar limb can be considered indeed as a straight 
infinite edge, since local deviations due, e.g. to mountains or canyons are generally 
not sufficiently coherently extended to produce effects in the Fresnel diffraction.!! !? 
However, local slopes are possible, such as for example on the side of a local moun- 
tain. The angle w in Fig. 3 represents this local limb slope, and from large volumes 
of occultation data it has been seen that slopes between +10° are very common 
and larger values are also possible. It is possible to recover ~ from the difference 
between the observed and predicted values of the limb rate. 


3. Instrumentation 


It can be appreciated from Fig. 2 that the times between the first and second fringe 
are of order 10 ms and they rapidly become shorter for the higher order fringes; it 
follows that we need instruments capable of ms time resolution. 

The first modern instruments adopted to record LO light curves were photo- 
diodes and photomultipliers in the visual, followed by InSb single diode detectors 
for the near-IR. While such devices can easily be driven to kHz speeds, they collect 
light from the full field of view. This is a disadvantage in LO work because the 
background — caused by solar light reflected on the bright side of the Moon and 
diffused in the Earth’s atmosphere — is generally quite intense. It is also strongly 
chromatic (approximately « \~*), and clearly dependent on the lunar phase, on 
the distance to the terminator, and on other observational factors. Based on a large 
sample of observations, Ref. 13 showed that in a 15” field such as that typically 
used in LO work the lunar background is on average equivalent to kK + 5 mag, and 
can reach up to 0 mag. In the visual, it is significantly brighter. This leads to a 
considerable, often dominant, background shot noise. 

A crucial improvement in this sense was made with the introduction of 
panoramic detectors, such as CCDs and infrared arrays.!° Later this was extended 
to other devices, in particular with the so-called burst mode developed at ESO 
for the Aladin detector of the ISAAC instrument already mentioned. On CCDs, 
the initial approach was to use the so-called drift-scanning mode.'* More recently, 
CCD and EMCCD devices have become available in which a proper subwindow 
can be read in a fashion similar to IR arrays.!° The advantage of fast imaging over 
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photometry is that it is possible to mask out the pixels where no stellar signal is 
present, and thus considerably reduce the background noise. !® 

Historically, the absolute timing of LO events has played a role in refining our 
knowledge of the Moon’s orbit and its limb topography. At present, laser lunar 
ranging and space probes have provided us with a wealth of information, largely 
replacing this aspect of LO observations. To analyze LO curves such as those shown 
in Fig. 2, relative times are sufficient. 


4. Data Analysis 


Reference 17 first introduced a least-squares method (LSM) to solve Eq. (8). In 
its basic form, five parameters are needed to fit the light curve of a single source: 
the time of occultation, the unocculted stellar intensity, the apparent rate of lunar 
motion, the background level, and the angular diameter. Convenient series expan- 
sions were available for the Fresnel integrals, and the equations could easily be 
extended to light curves with more than one star. The LSM also lent itself to 
easy computation of error estimates and covariance between the parameters. It 
could be easily coded and rapidly solved even on the relatively modest computers 
of the time and thus it became the standard in LO data analysis. A few improve- 
ments were made over the years, e.g. with provisions for a time-variable background 
and scintillation, the latter based on interpolation by Legendre polynomials.'® !9 
Also, extensions to sources other than uniform disks, e.g. Gaussian profiles, became 
possible. 

The LSM method, however, is by definition only applicable when a model for the 
source brightness profile can be formulated, with the corresponding parameters and 
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Fig. 4. Examples of one angular diameter and one binary star measured by LO. Left panel: 
occultation data (dots) for 7 Leo, best fit by a model with an angular diameter of 4.35 mas (solid 
line). The lower panel shows the fit residuals, enlarged by a factor of two for clarity, as a solid line. 
For comparison, the residuals of the best fit for a point-like source are also shown (dashed line). 
Right panel: The occultation data (dots) for 31 Ari, repeated twice and offset for clarity. The top 
set shows the best fit for a point-source model, and the bottom set the best fit for a binary model 
with 3.76 mas separation. The two notations mark the times of the geometrical occultation of each 
component. The vertical axis is in arbitrary units. Adapted from Ref. 22. 
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their derivatives. One significant step forward was made with the introduction of 
model-independent analysis, with algorithms based on iterative deconvolution.?° 7! 
The result is a model-independent profile which is not unique but which satisfies the 
maximum-likelihood estimator. This approach has proven very useful both as input 
to LSM, e.g. to put in evidence binary sources in which the diffraction patterns are so 
close that the binary nature is not evident by eye (Fig. 4), and to recover brightness 
profiles for complex geometries, e.g. in the case of circumstellar dust shells. 


5. Occultations by Other Solar System Bodies 


In recent years, many variations on the occultation theme involving solar bodies 
more distant than the Moon have been very successful. The aim is not to study 
the occulted star, but rather to characterize the occulting body with a detail that 
would otherwise not be possible with the angular resolution of a single telescope. 

One example is occultations by asteroids. Here, the occulting body has typically 
a size of a few kilometers, so that the ground track is correspondingly very narrow 
and has a significant error margin due to the unknown size and shape of the aster- 
oid. Such observations generally do not involve professional observatories, given the 
very small chance of a track passing over them, but rather the efforts of amateurs 
astronomers with small telescopes and the ability to relocate them, often in rather 
remote areas. They have been traditionally coordinated by IOTA, the International 
Occultation Timing Association. 

Another example is occultations by trans-Neptunian objects (TNOs), including 
those of Centaurs. These events, in addition to the limitations of asteroidal occulta- 
tions, are further complicated by the tiny angular sizes of TNOs and by the uncer- 
tainties in many of their orbits. As an example, Chariklo, the largest known Centaur 
body, subtends only 25 mas. Significant improvements are happening rapidly, how- 
ever, thanks to Gaia coordinates and to an expanding network of both movable and 
professional fixed telescopes capable of recording TNO occultations. Thanks to these 
efforts we are quickly gaining knowledge of many TNO physical characteristics, such 
as shape, density, albedo, limits on possible atmospheres, and even the presence of 
rings??? (see Fig. 5). 

As a last example of occultations by distant solar system bodies, we mention the 
observations of stars being occulted by Saturn’s rings and observed from the Cassini 
space probe.”® In this case, the occulter is not just one edge as in lunar and asteroidal 
occultations, but rather a whole system of edges due to the gaps between the rings. 
In turn, the edges are curved and are themselves composed of rocky small bodies. 
This requires ad hoc methods of data analysis and of tomographic-like imaging.?° 

The instrumentation required to study occultations by outer solar system bodies 
is very similar to that already described for LO, since we are dealing with high- 
temporal resolution photometric light curves. In fact, more distant bodies will lead 
to shadow patterns which move more slowly on the Earth’s surface than for LO, 
therefore the data rates can be relaxed by 1-2 orders of magnitude. 
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Fig. 5. A few of the occultation events observed for the Centaur Chariklo, showing the geometry 
of the central body and of the rings.?® 27 Plots courtesy of Dr. D. Bérard. 


6. The Future 


The above-mentioned techniques are likely to progress and possible technological 
improvements are already appearing, e.g. the use of spectroscopically dispersed 
light curves or of pupil-plane imaging. One completely new approach, however, 
could be represented by the use of artificial orbiting screens. If this became reality, 
the advantages would be enormous: essentially no background emission, very well- 
known geometry and optical properties of the occulting screen, and the possibility 
to maneuver the screen and therefore to choose sources and repeat the occultations 
at will. If slow-moving orbits can be achieved, the integration times could be propor- 
tionally increased, making it possible to measure not just stars but also extragalactic 
sources. One candidate could be the so-called solar sails being developed for space 
propulsion. 

Unfortunately, well apart from the cost of deploying and controlling such 
screens, the main problem is represented by the solid angle and therefore by the 
probability of occulting far, background targets. In order to be in a slow-orbit 
regime, the screen would need to be at distances similar to the Moon and the 
Earth—Moon Ly, Lagrangian points. But already the Moon, with a diameter of 
+3500 Km, subtends only 0.5° at that distance. A manmade structure would be 
considerably smaller, making the solid angle proportionally miniscule. 
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Astrometry is a powerful technique in astrophysics to measure three-dimensional 
positions of stars and other astrophysical objects, including exoplanets and the 
gravitational influence they have on each other. Interferometric astrometry is 
presented here as just one in a suite of powerful astrometric techniques, which 
include space-based, seeing-limited and wide-angle adaptive optics techniques. 
Fundamental limits are discussed, demonstrating that even ground-based tech- 
niques have the capability for astrometry at the single micro-arcsecond level, 
should sufficiently sophisticated instrumentation be constructed for both the cur- 
rent generation of single telescopes and long-baseline optical interferometers. 


1. The Promise of Astrometry 


Accurate measurements of the time-dependent angular positions of stars and other 
astrophysical objects are among the most fundamental measurements in astro- 
physics. Both wide-angle and narrow-angle astrometric measurements have a long 
history of advancing a broad range of astrophysical topics.1 Tycho Brahe’s sex- 
tant produced the first accurate measurements of the planets, enabling Kepler’s 
laws and leading to Newtonian mechanics and gravitation. These measurements 
were wide-angle measurements, where an instrument has to slew to different stars, 
making measurements at independent times. The famous measurement of the grav- 
itational deflection of light? was an example of a narrow-angle measurement, where 
a target and many reference stars are observed simultaneously, canceling out some 
instrumental effects. More recently, seeing-limited narrow-angle astrometric mea- 
surements have focused on parallaxes of nearby stars and detections of exoplanets 
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through their gravitational influence on their host stars. The semi-amplitude of the 
astrometric signal of an exoplanet is given by 


ae (+) Cau) ae (1) 


where Mp is the mass of the planet, Ms is the mass of the star, a, is the semi- 
major axis of the planet and @ is the star’s parallax. This means that measuring 
the astrometric signal from an exoplanet with 1 AU separation is at least Ms/Mp 
(about 10° for Mp = 1 Jupiter mass) times more difficult than measuring stellar 
parallaxes (Fig. 1). 


Many recent ground-based astrometric measurements have been controversial, 
including the famous example of the debunked planet around Barnard’s star,? and 
more recently the debunked planet around VB 10.**° Other measurements are less 
controversial, for example infrared parallaxes of field brown dwarfs,° which provide a 
determination of the luminosity and fundamental parameters of the coolest objects 
in the galaxy, not possible by other means. 

Space-based wide-angle astrometric measurements are a cornerstone of mod- 
ern astrophysics. The HIPPARCOS spacecraft solidified distance scales within the 
Galaxy, and, via standard candles, the Universe. The Gaia spacecraft is in the pro- 
cess of very significantly extending this work.” Depending on the mission length of 
Gaia and the fraction of a detected exoplanet orbit regarded as a secure detection, 
Gaia could discover up to 70,000 planets around other stars via astrometry.® 
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Fig. 1. Motion of the Sun’s barycenter due to perturbations of solar-system planets. The Sun’s 
disk is the orange circle for reference. Scaling between these linear units and angular units for a 
solar system analog can easily be done, with the solar disk diameter being equivalent to ~1/107th 
of the stellar parallax. See electronic edition for a color version of this figure. 
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The highest precision astrometry, however, requires large apertures or long 
baselines, and is arguably not best achieved from space. The Gravity instrument 
at the Very Large Telescope Interferometer (VLTI)? achieves very-narrow-angle 
astrometric precisions comparable to Gaia, but on much fainter stars. In order to 
understand the potential role of astrometric interferometry in the broad context of 
astrometry, we must first understand the fundamental limitations of the technique. 
With this motivation, this chapter first begins with describing fundamental astro- 
physical limits to astrometric measurements for given aperture areas and interfero- 
metric baselines, and moves on to limitations provided by the Earth’s atmosphere in 
increasing level of detail. Seecing-limited, adaptive-optics assisted and long-baseline 
interferometric techniques are discussed together, with a telescope diameter D or 
an interferometric baseline B used almost interchangeably. 

It is assumed that the reader of this chapter has a basic understanding of 
atmospheric turbulence theory (i.e. Fried’s coherence length and coherence time).* 
narrow-field optical interferometry and adaptive optics. Electric fields are approx- 
imated in this chapter as scalar wavefronts, and we consider vectors and angles 
in a plane perpendicular to the vector between a telescope and the center of the 
observed field. Finally, the practical difficulties in building real astrometric instru- 
ments is briefly discussed in principle, leading to a discussion of past, present and 
possible future astrometric interferometers. 


2. Astrometric Precision without the Earth’s Atmosphere 


For a star of AB magnitude m) observed with a fractional bandwidth AX/), a 
total aperture area A, integration time AT and overall instrument throughput 7, 
the number of photons detected is given by 


A 
Ny =5ARX 10! nAAT 10-04. (2) 


An AB magnitude is roughly the same as a Vega magnitude at visible wavelengths, 
and reaches Myega — MAB = 1 at ~1.28um.'° Armed with the knowledge of how 
many photons are expected from astrophysical sources, we can use maximum likeli- 
hood estimation under the assumption of a perfectly sampled point-spread function 
to determine a target shot-noise limited astrometric error. For image-plane detection 
with a point-spread function f(a, 3) for angular coordinates a and £, the photon- 
limited centroid error from maximum likelihood estimation can be easily shown 
to be: 


Oy = PHL, (3) 


Vp 


*See Chapter 13 of Volume 2 for a brief discussion of atmospheric turbulence theory. 
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where 
2 1 
Ost = fi 2 & BGs Bande 
Sf 22° (a, B)/ fla, B)dads 


Here the point-spread function f is assumed to be normalized (i.e. an integral of 
unity), and N, is the total number of photons recorded in the image. This type of 
relationship can be derived directly from quantum mechanics,'! or from differenti- 
ating the logarithm of the maximum likelihood for a Taylor-expanded point-spread 
function in the limit of small values of a and small pixels. In practice, uncertainties 
will be close to these limits as long as the point-spread function is critically sam- 
pled (more than two pixels per FWHM), the core of the point-spread function is 
significantly brighter than any background, and detector readout noise is negligible 
compared to any shot noise. 

The effective point-spread function size apse is ~0.33A/D for a fully-filled 
diffraction-limited aperture of diameter D, ~0.16\/B = /27B for a 2-aperture 
image-plane interferometer of baseline B, or ~0.536, for a seeing-limited image 
with seeing full-width-half-maximum 6,. Note that interferometers do typically have 
lower throughput than adaptive optics systems, which means that for the same num- 
ber of photons, an interferometer of baseline B is nearly equivalent to a telescope of 
diameter D. Equation (3) then means that we can approximately consider an inter- 
ferometer to be superior to a telescope from a single-star photon-limited perspective 
whenever Dint Bint 2 Di. This means that the VLTI with the Auxiliary Telescopes 
should have better photon-limited astrometric precision than an individual VLT 
Unit Telescope, but the E-ELT should have better photon-limited precision than 
even the VLTI with the Unit Telescopes. 

These fundamental limits correspond to very small angles for typical astro- 
physical targets, as shown in Fig. 2. Although incomplete (e.g. not taking into 


(4) 
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Fig. 2. Astrometric uncertainties limited by photon shot noise as a function of AB magnitude, 


assuming an integration time-throughput product of 10 minutes. For large adaptive optics equipped 
telescopes or long-baseline interferometers, shot noise in centroid measurement is not a dominant 
error term. See electronic edition for a color version of this figure. 
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account background noise), this demonstrates that less fundamental effects will 
limit astrometric precision, such as instrumental imperfections (e.g. imaging system 
distortions) and high-altitude atmospheric turbulence. 


3. The Effect of the Earth’s Atmosphere on Astrometry 


In the literature, there are numerous discussions of atmospheric turbulence profiles 
and appropriate multi-layer atmosphere models.” In order to simplify analysis of 
a complex atmosphere with many layers, we will consider atmospheric turbulence 
from a single layer with at an effective height h, moving as a frozen flow with an 
effective velocity 0. Both of these quantities come from turbulence-weighted averages 
throughout the atmosphere of each quantity to the 5/3 power: 


ms | C2) RS dh i 
J C2(h)dh 


(5) 


2h) u(h)5/3 3/® 
- (So) | 6) 


[ C2(h)dh 


Turbulence is stronger towards the lower atmospheric layers, so a typical effec- 
tive turbulence height h at a mountaintop site might be 2.5km, with an effective 
wind velocity 0 of 15-20 ms~'.!? By simplifying the atmospheric profile to a single 
layer, we can also consider our single layer going at an average speed U to have a 
velocity vector 6. This assumption of course changes some details of this chapter, 
and any results that require a lucky turbulent layer velocity direction should not be 
viewed as realistic. 


3.1. Single-Star Astrometry 


In the case of single-star (sometimes called wide-angle) astrometry from the ground, 
the Earth’s atmosphere has two key effects — it increases the photon-limited uncer- 
tainty due to a larger point-spread function (Sec. 2), and it shifts the star image 
to and fro via the tip/tilt mode of a single aperture, or the piston mode of an 
interferometer. We can approximate the power spectral density of a tip/tilt mode 


of a single telescope at low frequencies as, e.g.1 1° 
P(f) = 0.096(ro/8)"/9(X/r0)? f-2/9(1 — e FP0/?8)")) rad? /Hz], (7) 
or 
P(f) = 0.096(ro/0)"/?(A/r0)?(f? + (G/Lo)?)“/° rad” /Hz], (8) 


>See Chapter 13 of Volume 2 for a brief discussion. 
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up to a cutoff frequency f. = 0.240/D. Here rp is the Fried coherence length.° is 
the observing wavelength, v is the effective wind velocity and Lo is the turbulence 
outer scale in an exponential model (Eq. (7)) or the often used von Karman model 
(Eq. (8). 

The effect of a large aperture is to change the cutoff frequency, reducing the 
instantaneous rms tilt, but it does not affect the long-integration time astrometric 
uncertainty. We evaluate this uncertainty by integrating this power spectral density 
multiplied by a sinc? function, i.e. 


fe 
c= 2 | P(f)sinc?(afAT)df (9) 
0 
 0.55(ro/0)'/8(A/ro)AT—/8 ~— for Lo > DAT > D. (10) 


Inserting typical values of Lg = 100m, 0 = 5ms~!, and rg = 0.1m at \ = 0.5 um, 
this gives a 0.18” uncertainty after a 20s integration. For longer integration times, 
uncertainties decline rapidly, but are very dependent on the details of the turbulence 
outer scale. It is quite clear that star motions from bulk atmospheric flows do not 
cancel out when one star is observed at a time. This is one of the reasons that ground- 
based single-star astrometry has struggled to overcome atmospheric limitations and 


produce uncertainties below 0.02” .1° 


3.2. Dual-Star Astrometry 


When at least one additional star can be observed simultaneously with a target, 
relative astrometric uncertainties are significantly reduced. This is the case for 
seeing-limited imaging, adaptive optics assisted observations and long-baseline inter- 
ferometry. As shown in Fig. 3, in dual-star astrometry (often called narrow-angle 
astrometry), pupil-plane aberrations in telescopes, including piston phase offsets in 
a long-baseline interferometer, cancel out for both stars observed. In the special 


star 1 star 2 star 1 star 2 star 1 star 2 


atmospheric 
height h 


telescope diameter D interferometer baseline B 


Fig. 3. Dual-star astrometry in the narrow-angle (left, 0h < D) and very-narrow-angle (right, 
Oh > B) regimes. Adapted from Ref. 17 (with permission). 


“See Chapter 13 of Volume 2 of this Handbook. 
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17,18 aberrations from the upper atmosphere 


case of very-narrow-angle astrometry, 
also partially cancel out. 

A key atmospheric parameter in determining the uncertainties in dual-star 
astrometry is the isoplanatic angle, which is the angle at which the RMS phase 
difference between two thin beams traveling through the atmosphere is 1 radian. 


This is given by 
69 = 0.3. (11) 


Ground layer turbulence changes rg and h equally, so does not affect 6) and can be 
ignored in astrometry (assuming that AO systems and fringe trackers still function). 
For typical 1” seeing at visible wavelengths, with turbulence spread throughout the 
atmosphere at an effective altitude of 2.5km, we obtain 0) = 2.6” in the visible, 
and 16” in the K-filter (2.2 4m). Isoplanatic angles are often used as a proxy for 
how far off-axis guide stars can be to act as a coherent-phase reference, for phase 
referenced astrometry or off-axis adaptive optics. 

As apertures in both single-telescope astrometry and interferometric astrome- 
try are typically larger than the Fried coherence length 779, observations can average 
over larger patches of atmosphere, filtering out high spatial frequency components 
of the atmosphere and obtaining larger coherent fields of view. This results in more 
advanced terminology, such as the isokinetic angle in laser-guide star adaptive 
optics, and the isopistonic angle in long-baseline interferometry. We will ignore 
these effects here, and simply assume that the tilt or piston is measured for two 
stars separated by some angle. This is effectively an assumption that either the 
interferometric target flux is bright enough for simultaneous fringe tracking on two 
objects, or the aperture size is large enough so that off-axis phase referencing is 
viable. In addition, it is an assumption that an adaptive optics system can provide 
sufficiently high off-axis Strehl ratio in order to meet either of these conditions. 
These assumptions also limit the validity of our discussion to the long exposure 
time regime; fluctuations on short exposures will depend on aperture size, fringe 
tracking and adaptive optics details, but will average out. Finally, we will also use 
terminology of baseline rather than telescope diameter in discussing the effects of 
phase fluctuations on astrometric uncertainties, but note that telescope diameter 
can be roughly substituted for baseline in the case of single aperture measurements. 

With these assumptions, we can reduce the question of astrometric uncertainties 
to spatio-temporal correlations on a wavefront at height h. The measurement of 
differential piston between two stars is given by 


S— = |B\(le(B) - 9(0)] ~ [eB + 6) ~ e(6R))). a) 


Here A@ is the instantaneous astrometric error corresponding to a phase uncertainty 
on baseline B. We consider fluctuations in A@ as the vectors B and 8 move around 
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the wavefront by the spatial coordinate vAT, or equivalently its reciprocal space 
vector Kk. 

It should be instantly clear that this uncertainty is zero for the lowest-order 
wavefront spatial frequencies, which correspond to a tilt on the wavefront y. High 
spatial frequencies, with |K| > 1/|@|h and |«| > 1/|B|, will have twice the vari- 
ance of a fringe on a single baseline, and will average out relatively quickly, while 
intermediate spatial frequencies are partially canceled out in the dual-star mode. 

We can consider the power spectrum of A@ by recognizing that the sums and 
differences of wavefront positions expressed in Eq. (12) are equivalent to convolu- 
tion in the Fourier domain. We then arrive at the two-dimensional spatial power 


spectrum of differential astrometry fluctuations:!> 1” 


Q(K) = 0.00928|B]~2A?r9 °/3 Jc] 1/3 sin? (BB - «) sin? (hO - ). (13) 
Note that the term \?r9°/? can also be written as 0.00830°/3 
0 seeing,0.5 um? 
Osccing,0.5 »m iS the visible seeing disk full-width-half-maximum, and so is indepen- 
dent of wavelength. 

Converting this spatial spectrum to a temporal spectrum involves considering 
the evolution of this differential astrometric signal as the wavefront moves past 
along the vectors GAT after a time AT (see Fig. 4). This is in turn an integral in 
«-space,!® along the dimensionless unit-vector direction é,, which is perpendicular 
to v, enabled by the decomposition « = (f/v)é) + KEL: 


where 


UV 


1 
This integral can be computed numerically, and has been calculated for the 


four limiting geometries covering the relative directions of B, « and 0, as shown in 
Fig. 5, In all cases, the RMS astrometric uncertainty is ~6 milli-arcseconds, although 


vAT + 0h 


B+0AT+6h 
Ww 


Fig. 4. Illustration of vectors @h, B and GAT on the wavefront y(a) at an effective turbulence 
height h. The Taylor hypothesis enables the spatio-temporal correlation to be approximated as a 
spatial correlation between the wavefront at eight vector positions. Unit vectors parallel (é))) and 
perpendicular (é€_) to the wind direction are also shown. 
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Fig. 5. The temporal power spectrum of differential astrometry, for point-like apertures and the 
four limiting geometrical cases, with the angle between the two stars @ being either parallel or 
perpendicular to the baseline and wind direction. Parameters for the calculation are: 30 m baseline 
B, 1’ star separation, 10 ms~! wind speed, 2.5 km effective turbulence height, 2.2 um wavelength 
and 0.6m Fried coherence length rg at this wavelength. Dashed lines show the same calculation 
with a von Karman outer scale length of Lp = 60m. Key knee/null frequencies are shown, and in 
all cases the total RMS differential angular error is 6-7 milli-arcseconds (2.7 rad of fringe phase). 
See electronic edition for a color version of this figure. 


the long exposure uncertainties clearly differ between the different cases. The very 
different low-frequency behavior of the four cases mostly goes away once one aver- 
ages over all wind directions, as seen in Fig. 6. The flat power spectrum at low 
frequencies means that astrometric uncertainty goes as the conventional AT~'/?, 
so long as the integration time AT is significantly greater than the wind crossing 
time of the baseline. Note also that higher wind velocities decrease the long-exposure 
uncertainties (i.e. the low frequency power), because a larger number of independent 
wavefront samples are included in the average. The long exposure uncertainty in 
dual-star astrometry comes from integrating the product of this frequency spectrum 
with a sinc? function, which is approximately the same as scaling the zero frequency 
intercept by VAT, i.e.: 


op(AT) = (f)sine? (a fAT) df (15) 


e(0)AT-1/?, (16) 


2 


This result is shown in Fig. 7 for the same atmospheric conditions as Fig. 5, for 
binary separation parallel to the baseline, and for the same baselines as discussed 
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Fig. 6. The same as Fig. 5, except averaged over all wind directions, for 12m and 120m baseline 
lengths and including a higher wind velocity as well as the default of 10ms~. See electronic 
edition for a color version of this figure. 
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Fig. 7. Astrometric uncertainty as a function of separation in the case of dual-star astrometry, 
with the same parameters as Fig. 5. See electronic edition for a color version of this figure. 


in Shao and Colavita (1992).!” Different assumptions on turbulence profiles slightly 
change the numerical values in this relationship, but the key asymptotic relations 
remain the same. These are as follows: 
og x |B~7/36| for || < |B|/h, AT > |B|/|®|, (17) 
oo x |O|!/> — for |8| > |B /h, AT > hlO|/|o. (18) 
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3.3. Multi-Star Astrometry 


In practice, the great advantage of past astrometric imaging instruments over 
long-baseline interferometry has been the ability for multiple reference stars to be 
observed. By tying together many astrometric fields with stars in common, multi- 
star astrometry in principle means that an astrometric grid over the whole sky can 
be constructed. In practice, this is best done from space, with HIPPARCOS and 
Gaia providing the reference star grids for the future. Ground-based and future 
space-based narrow-angle astrometry will be tied to these grids and have relative 
precision within a field limited by Gaia (uncertainties as small as one part in 101°), 
but in principle can obtain much better precisions than Gaia for individual stars. 

Multi-reference star astrometry has never been attempted in long-baseline inter- 
ferometry (dual-star astrometry is difficult enough!) but has been used in precision 
astrometric experiments on single telescopes using adaptive optics.2?:?! Although 
calibration has been difficult, this use of multiple reference stars has enabled astrom- 
etry at the 100 micro-arcsecond level, even on the relatively short baselines afforded 
by single telescopes. 

We will consider idealized narrow-angle astrometry for two classes of multi- 
reference star astrometry here: two reference stars either side of a target, and 
three reference stars around a target with arbitrary vector separations. The angular 
separation vectors of these stars are 6, 02 and @3. 

In the linear (two-star) case, where the reference stars are on either side of the 
target, the differential piston astrometric signal is formed from the fringe phases as 


AOdr \02| 2 _ 
ae = |B| (vB) — y(0)] — Thl+ al PB + Oh) — (A1h)| 
|91| > > 
Ty 0 PB + Ozh) — p(62h)]). (19) 


The spatial power spectrum for the symmetric two-reference star case is given by 
Q(«) = 0.00928|B]~2\2r°/ |s| “11/8 sin? (7B - 2) sin*(hO - 1). (20) 


The only difference to the case with one reference star is that the second sine 
function is raised to the fourth power. 
For the three star case, the differential piston astrometric signal is formed from: 


3 
32 <n (ict) — 0) Yo mfors + 8) — (0.0). (21) 


27 = 
The weights, w = {w1, w2,w3}, come from solving the 3 x 3 linear system, 


[0,003] -w = 0 (22) 


3 
Saat (23) 
w=1 
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Fig. 8. Temporal power spectra in the case of one reference star (solid), two symmetrically placed 
reference stars (dashed) and three symmetrically placed reference stars (dotted). Low-frequency 
power is reduced by a factor of ~100 for the 120 m baseline case by the additional reference star(s), 
decreasing required integration times by the same factor. See electronic edition for a color version 
of this figure. 


and ensure that the lowest spatial frequency aberrations in the upper atmosphere 
turbulence cancel out, with the higher spatial frequency terms averaging out rela- 
tively quickly. The power spectrum then becomes 


3 2 
O(K) = 0.00232|B]|~2d?r5°/3 || 11/3 sin? (mB - s) |1 — ) wi exp(i2hO; - &) 
i=l 


(24) 


The power spectra for the cases of two and three reference stars are shown in 
Fig. 8, and the angular and baseline dependence of the uncertainty of the three-star 
case are shown in Fig. 9. The two-reference star uncertainty is nearly indistinguish- 
able from the case of three reference stars where two are separated by 160 degrees, 
so is not plotted. With these additional reference stars, the astrometric uncertainty 
is significantly smaller at small angular separations, with an asymptotic power law 


with uncertainty proportional to |B| and |6|*/°. 


4. Limitations of Real Instruments 


4.1. A Narrow Angle Interferometer Archetype 


For sky coverage reasons, astrometric observations are carried out between tar- 
gets with separations larger than the diffraction limit of a single telescope. Since 
the field of view of interferometric instruments is traditionally the diffraction limit 
of its telescopes, an astrometric observation requires independent interferometric 
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Fig. 9. Astrometric uncertainty as a function of separation in the case of astrometry with three 
reference stars. The solid line is for equal spacing (at 120° angles), the dashed line is for spacings 
of 0.5, 1 and 1.5 times the mean separation, and the dotted line is for equal separations, but 100°, 
100° and 160° angular spacing of reference stars. See electronic edition for a color version of this 
figure. 
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Fig. 10. A narrow-angle interferometer archetype. Two independent interferometric detectors 
Detp and Detg observe two targets P and S with two telescopes Tel, and Tela. A metrology system 
(in red) measures the differential internal optical path ALint = (L2,p — Li,p) — (Lo,5 — Li,s). 


detectors, one per observed target. As for single telescopes where the astrometric 
measurement usually amounts to counting the pixel distance between objects in the 
focal plane, the distance between interferometric detectors needs to be measured. 
This is taken care of by a metrology system, as conceptually illustrated in Fig. 10 
for an observation of two targets. This metrology measures the internal optical path 
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double difference AL in, between the two targets (P, S) and the two telescopes (1, 2): 


ALint = (Lo,p — [1,p) — (L2,5 — 1,8). (25) 


When the delay lines equalize the optical path through the telescopes for each target 
such that ADing + ALext = 0, the differential external delay ALex, = —B. AF is 
directly given by the internal metrology measurement: 


ALin, = B+ AB. (26) 


Knowing the baseline B , and measuring the differential internal optical path ALint 
with the metrology, the separation AS projected on the baseline direction is deter- 
mined. For a complete knowledge of the separation vector As, multiple baseline 
orientations, super-synthesis, or a combination of both are needed. 

In practice, the metrology system measures the differential internal optical path 
AL int to within an zero-point AL into, determined at the same time as the separation 
vector AS: 


=> 


ALint — ALinto = B+ AB. (27) 


One of the two following methods is used to independently identify the zero-point 
and the separation. Either the astrometric observation is performed over a range of 
hour angles large enough for the sine-like B-AFto separate from the constant ALint,o 
zero-point, or the separation vector AS is inverted or zeroed to independently 
determine the metrology zero-point. An inversion of the separation is achieved by 
swapping the two observed targets between the two detectors, whereas a zeroing is 
obtained with one single target observed simultaneously by the two detectors. Both 
operations require special capabilities at the level of the dual-star, but for simplicity 
and efficiency reasons, the swap approach has always been favored. 


4.2. Back-of-the-Envelope Error Budget 


In order to develop a realistic error budget for a narrow-angle astrometric observa- 
tion, Eq. (26) needs to be supplemented by the effects of atmospheric turbulence 
and detector photon noise: 


Nise Blige + ALae = BRE (28) 


Since the contributions of the photon noise and the atmosphere, both with zero 
mean, have been extensively covered in Sec. 2 and Sec. 3, respectively, they will not 
be considered any further in this section. 

Neglecting the geometry of the observation, the error on the separation dAs 
relates to the error on the baseline knowledge 6B and on the internal optical path 
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difference 6A Lj, as follows: 


dAs\"  (5B\* . (dALin \* Se 

te) le) Pee os 

The fractional error on the separation, As/As, is a combination of the fractional 
error on the baseline 6B/B and the fractional error on the differential internal 
optical path difference dALin / ALint. To illustrate, a precision/accuracy of 10 micro- 
arcseconds for a 10” separation requires better than a 10~° fractional error on both 
the baseline and the differential internal optical path difference. This corresponds 
to better than ~100 wm on the baseline, and better than ~5nm on the differential 


internal optical path difference. Since the differential internal optical path difference 
is deduced from the differential phase Ad¢met of the metrology laser, as 


Amet 


ALint = 


Admet; (30) 


T 


the knowledge of the laser wavelength Amet is critical. To obtain a fractional error 
of 10~° on the separation, an equivalent fractional precision/accuracy is required 
on the metrology wavelength. 

Depending on the scientific objectives of the astrometric observations, an abso- 
lute measurement of the separation might not be necessary, i.e. it might be sufficient 
to know the baseline length and the metrology wavelength to a scaling factor. In 
this case, the measured separation will be affected by that same scaling factor. 
However, since astrometry is a technique to observe the universe in motion, an 
eventual scaling factor has to be at least constant, at the levels described above, 
over the multi-year duration of the measurement. 


4.3. Noncommon Path 


The real challenge of interferometric astrometry is making sure that the metrology 
measures exactly the optical path experienced by the astronomical light. Any mis- 
match between the two qualifies as a noncommon path error. Detailed analysis for 
some of these noncommon path errors can be found in the literature;?? we discuss 
some of them below. 


Path coverage: Ideally, the astrometric metrology should measure the internal 
differential optical path difference from the beam combination point to the primary 
space where the baseline is defined. In practice, depending on the metrology imple- 
mentation this may not be completely achievable. A telescope-differential metrology 
(measuring Lo, p— L,,p and L2,5— L},5) injected on the back side of the beam com- 
biner might not be able to go all the way up to the telescope, being blocked by the 
phase errors induced by the deformable mirror of an adaptive optics system (e.g. 
ASTRA on Keck Interferometer). On the other hand, a target-differential system 
(measuring Ly, p—Ly,5 and Lz p—L2,3) would be able to go through the deformable 
mirror allowing the internal differential optical path difference to be measured in 


118 M. J. Ireland & J. Woillez 


primary space (e.g. Gravity on the VLTI), but at the expense of a noncommon path 
between the injection on the back side of the two beam combiners. 


Beam walk: When the footprints of the stellar and metrology lights are not the 
same, either due to a smaller metrology beam propagating in the central obscuration 
of the telescope (e.g. PRIMA on the VLTT), or to a smaller metrology sensor (e.g. 
Gravity on the VLTI), and when this footprint moves on imperfect optical elements, 
the stellar and metrology lights do not experience the same optical path. This effect 
can either appear as an additional noise in the astrometric measurement when, 
e.g. internal turbulence induces the beam motion, or as a bias when the motion is 
correlated with the astrometric observation sequence. 


Chromatic effects: All interferometers have used the metrology laser at a wave- 
length outside the stellar bandpass. The refractive index of the material used in 
transmission has an impact on the measurement of the internal optical path. When 
the differential optical delay is not implemented in vacuum (e.g. in fluoride glass 
fibers for Gravity on the VLTT), Eq. (30) needs to be modified to include the refrac- 
tive index n of the material. 


Nsci A 
Alin = sci “\met Agi: (31) 
Mmet 27 
The same level of requirement on the metrology wavelength applies to the refractive 
index. 


Polarization: Having all the arms of an interferometer with identical polariza- 
tion properties is sufficient only to observe unpolarized sources.?? The astromet- 
ric metrology being built around highly polarized lasers, these systems are bound 
to have issues that can impact astrometric measurements. As an illustration, the 
vibration metrology of the Keck Interferometer needed its polarization state to be 
adjusted with the attitude of the telescopes to operate reliably,?4 and the field 
derotator of the VLTI auxiliary telescopes had to be polarization-compensated to 
extend the astrometric metrology of PRIMA to the secondary mirror of the tele- 
scopes.?° Due to the polarization properties of SgrA*, the impact of polarization 
on astrometry has been studied for the Gravity instrument of VLTI.7° However, 
polarization has not been identified (yet) as the limiting factor of the astrometric 
performance of the instrument. 

The noncommon path terms above are the main reason for choosing the sep- 
aration swap over the hour-angle coverage method of determining the metrology 
zero-point (see Sec. 4.1). Regular swaps on short timescales help reject tempo- 
ral variations of noncommon path effects into the metrology zero-point, reducing 
the impact on the separation measurement. This approach works as long as the 
swap operation itself does not introduce an astrometric bias, by, e.g. impacting the 
primary-space conjugation of the narrow-angle baseline,?” as described in the next 
section. 
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4.4.  Astrometric Baselines 


In practice, one of the major limitations in astrometric interferometry is the defini- 
tion of an narrow-angle astrometric baseline.?® Where three or more reference stars 
are used, as is the case in single-telescope astrometry, variations in the astrometric 
baseline in principle cancel out by construction (Eq. (21)) and it simply becomes a 
scaling factor for astrometric shifts. However, for only a single reference star (i.e. all 
existing or planned long-baseline instruments), the astrometric baseline uncertainty 
becomes a relative astrometric uncertainty, as shown in Eq. (29). 

The definition of the narrow-angle astrometric baseline comes from the require- 
ment of no optical path gap between the coverage of the internal metrology measure- 
ment, AZint, and the external delay, B. As. Said differently, the internal metrology 
has to reach, at each telescope, the points that define the narrow-angle baseline B. 
Tying the baseline vector B to the Earth reference frame (International Terrestrial 
Reference System, ITRS) and the transformation of the baseline to the star reference 
frame (International Celestial Reference System, ICRS) is sufficient to determine 
the B- AS scalar product. The baseline vector must therefore be expressed in pri- 
mary space, i.e. before any conjugation by the primary mirrors of the telescopes. If 
the metrology endpoint does not reach primary space, its conjugation to primary 
space needs to be monitored against a primary space reference. Most larger tele- 
scope employed so far for astrometric interferometry have a pupil with a central 
obscuration caused by the secondary mirror. As such, the telescope pivot point, 
the only Earth-fixed point of an ideal telescope that the pointing axis intersects, 
cannot be observed from inside the instrument and therefore reached by astro- 
metric metrology. To overcome this issue, different methods have been considered. 
For the ASTRA project on the Keck Interferometer (see Sec. 5.3), the metrology 
was terminated inside the adaptive optics system and monitored with respect to a 
primary space reference inserted between the segments of the primary mirror. For 
the Gravity instrument on the VLTI (see Sec. 5.6), the metrology extended directly 
into primary space, up to metrology receivers located on the spiders supporting the 
secondary mirror of the telescopes. 

An absolute knowledge of the baseline is necessary for an absolute measure- 
ment of the separation. This raises the issue of the narrow-angle baseline absolute 
calibration. Past and present interferometers do not currently have readily available 
binary targets with separations known at the level of a few micro-arcseconds* and 
bright enough to be observable. This has been circumvented by transferring the 
wide-angle baseline, measured with a set of single stars with known positions and 
large separations, to the narrow-angle baseline.??:?” °°: 31 This transfer adds a second 
layer of complexity to the baseline issue, as the adjustment of a wide-angle baseline 
model alone is not sufficient for the transfer from wide-angle to narrow-angle to be 


4This might change soon with the upcoming data release of the Gaia mission.?° 
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under control. The wide-angle pivot point and the absolute internal optical path 
difference, Lint, need to be controlled as well for the single-target measurements.?° 


5. Past, Current and Planned Instruments 


5.1. Mark IIT 


On Mount Wilson (US), the Mark III®? was the first interferometer to carry out a 
dual-star observation,®* following the seminal paper of Shao and Colavita (1992).!” 
The focal plane of the interferometer was modified to simultaneously observe the 
two components of the double star a Gem, and study the atmospheric turbulence in 
the narrow-angle regime. No disagreement was found between this first observation 
and the predictions of a Kolmogorov turbulence model. 


5.2. Palomar Testbed Interferometer 


A more systematic exploration of the parameter space required a dedicated dual-star 
interferometer. The successor of the Mark III, the Palomar Testbed Interferometer 
(PTI)** on Mount Palomar (US), was constructed to verify the atmospheric limits 
and demonstrate the technologies needed for narrow-angle astrometry. PTI had 
three 40cm siderostats equipped with a dual-star separator generating a beam for 
each of the two observed targets, dual delay lines and differential delay lines to 
equalize the optical path for both targets, two interferometric detectors, one of 
them acting as a fringe tracker to compensate the atmospheric perturbations, and 
an astrometric metrology system covering the full optical path from the interfer- 
ometric detectors to a retro-reflector at each siderostat in the path common to 
the two targets. The first narrow-angle astrometric measurements were obtained 
on the ~31.5” separation binary 61 Cyg ,°°:°° and reached a stability in the range 
100-170 micro-arcseconds, which corresponds to a fractional error of ~5 x 107°. As 
a demonstrator with small apertures, PTI had limited sensitivity, preventing further 
astrometric investigations. 

PTI also carried out very-narrow-angle astrometric observations, under a 
project named PHASES,?” where the target pairs were fully contained inside the 
~1” diffraction limit of its 40cm siderostats, and therefore did not require a dual- 
star module. A beamsplitter separated the light between a fringe tracker compen- 
sating the atmospheric piston and a science camera scanning through the two fringe 
packets generated by the pair. The astrometric separation was deduced from the 
separation within a scan between the two fringe packets. This type of astrometric 
observations, in the double packet regime,°* is fundamentally different in concept 
and implementation from the narrow-angle astrometry presented in this review, 
being more related to imaging astrometry.?° 
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5.3. Keck Interferometer 


Successor of PTI, the Keck Interferometer project?* on Mauna Kea (US) started 
receiving funding in 1998.°° It was planned as an extension of the Keck Observatory 
twin 10m telescopes, with four additional 1.8m telescopes.*° The 1.8m telescopes 
had been planned from the beginning for narrow-angle astrometry, including a dual- 
star capability and some equipment to monitor the telescope’s pivot points and the 
transfer of the narrow-angle baseline onto the wide-angle baseline.®° The delay lines, 
similar to the ones at PTT, also had a dual-star capability. Despite having been built, 
the astrometric project with the 1.8m telescopes was canceled in 2006. 

In 2006, the ASTRA project*! on the Keck Interferometer started implementing 
a phase referencing and narrow-angle astrometry capability, but this time on the 
main 10m Keck telescopes, with the observation of the galactic center as a main 
objective. ASTRA developed?! a dual-star capability at the focus of the adaptive 
optics systems equipping the two 10 m telescopes, a 1319 nm double-pass astrometric 
metrology covering the optical path from the two existing fringe trackers to common 
retro-reflectors located in front of the deformable mirrors of each adaptive optics 
system, and a camera monitoring the conjugation of the astrometric baseline to pri- 
mary space. Unfortunately, after delivering an off-axis fringe tracking capability ,4? 
ASTRA did not have time to demonstrate any astrometric performance before the 
Keck Interferometer ceased operation.** 


5.4. Sydney University Stellar Interferometer — MUSCA/PAVO 


The Sydney University Stellar Interferometer (SUSI) attempted very-narrow-angle 
dual-star metrology over the years 2011-2013,44 in order to explore limits for 
astrometrically detecting planets in binary star systems. The dual-star system 
MUSCA (Micro-arcsecond University of Sydney Companion Astrometry) measured 
the fringes on a uniaxial beam combiner over the wavelength range 0.77—0.9 um 
on either the primary or the secondary star, while the companion beam combiner 
PAVO tracked and recorded telescope-differential optical path differences on the 
primary star over the wavelength range 0.54—0.76 zm. Operating only over a field of 
view of up to ~3arcseconds and with a magnification of only a factor of 3 through 
the delay lines, optical paths were assumed equal for both observed stars stars up 
to the MUSCA/PAVO dichroic split, with a target-differential metrology system 
between the beam combiners. The wide-angle baseline and imaging baselines were 
kept in sync at the cm level (with upgrades identified for mm-level accuracy) by a 
pupil viewing and alignment system. 

This system successfully demonstrated phase-referenced interferometry at the 
level of 100 uz as in the most favorable conditions. Potential paths to lower astromet- 
ric uncertainty were identified, including a more extensive control of noncommon 
path aberrations and removing spurious reflections of metrology signals. However, 
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the sensitivity of PAVO as a fringe tracker was ultimately inadequate for the exo- 
planet science case (approximately V ~ 5 in good conditions). 


5.5. Very Large Telescope Interferometer — PRIMA 


In Europe and Chile, the astrometry and phase-referenced imaging project 
PRIMA* of the Very Large Telescope Interferometer started being developed 
in 2000. It provided a dual-star capability*® for the 1.8m Auxiliary Telescopes 
(AT) and 8m Unit Telescopes (UT), vacuum differential delay lines,*” 
of two-telescopes fringe trackers,4* and an astrometric metrology system.*® The 
installation started in 2008 and the first astrometric observations were carried out 
in 2011. By mid-2012, it became apparent that the astrometric performance of 


a pair 


3 milli-arcseconds for a 10” separation, which corresponded to a fractional error 
of ~3 x 1074, was not meeting the 1075-10~® expectation.?” Major shortcomings 
were identified in the implementation of the fringe sensors, the star separators, 
and the astrometric metrology defining the baseline. Polarization effects introduced 
by the field derotators of the star separators, combined with a polarization-based 
fringe sensor design, impacted fringe tracking performance and sensitivity, while the 
termination of the astrometric metrology inside the star separators induced large 
primary-space-conjugated astrometric baseline motions. These issues were partly 
addressed by compensating the polarization properties of the field derotators and 
extending the metrology to the secondary mirror of the telescopes. In its improved 
configuration, PRIMA demonstrated?° a performance of 800 micro-arcseconds for 
the same 10” separation, but with a very unfavorable projected baseline of 30m. 
Extrapolated to 150m, this amounted to 160 micro-arcseconds, or a fractional error 
of 1.6 x 107°, still short of the requirements of an exoplanet detection and charac- 
terization campaign.°° Unfortunately, in direct competition with Gaia,?? the space 
astrometry mission of ESA, the PRIMA project was put on hold in 2014 and can- 
celed the following year. 


5.6. Very Large Telescope Interferometer — Gravity 


Currently, the only narrow-angle astrometry instrument in operation on any inter- 
ferometer is Gravity®! on the VLTI. Started in 2008, Gravity is a K-band fully 
cryogenic two-object beam combiner installed at the VLTI focus. Its main objec- 
tive is the study of the supermassive black hole at the center of our galaxy. As 
a second generation VLTI instrument, Gravity benefited from the experience with 
the previous generation of interferometric instruments, of which the evolution of 
its astrometric metrology design®”’°? is a good illustration. It includes two fiber-fed 
integrated optics 4-telescope beam combiners,** a low-noise infrared detector for its 
fringe tracker, active control loops for the stabilization of the fields and pupils,®° 
and a novel 1908nm astrometric metrology system.°° The dual-star capability is 
contained inside the instrument itself and therefore has to work within the field of 
view delivered by the VLTI, ~2” for the UTs and ~4” for the ATs, which limits 
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the sky coverage compared to past instruments but also reduces noncommon path 
effects between the two observed objects. The astrometric metrology design chosen 
for Gravity is unique. The metrology laser light is split into two differentially phase- 
shifted beams, back-injected into the integrated optics combiners, and propagated 
toward the telescopes. A local metrology pickup measures the optical path difference 
between the two objects at each telescope input of the instrument. The signal level 
there is high enough to record the variations of the optical path differences, without 
wrapping errors, continuously throughout an astrometric observation. Because the 
metrology measurement is target-differential, rather than telescope-differential, the 
Gravity metrology can propagate past the deformable mirrors of the adaptive optics 
systems. Four additional metrology receivers are installed on the spiders of each 
telescope; they directly materialize the astrometric baseline in primary space. The 
pupil control loops measure the relative positions between the metrology receivers 
inside the instrument and the ones on the telescopes, and help unwrap the telescope 
metrology signals between subsequent measurements of an astrometric observation 
sequence. Because the differential delay capability is implemented in optical fibers, 
the control of dispersion is critical to the performance of the instrument, as presented 
in its error budget.°’ The first observations with Gravity, including its astrometric 
mode, were carried out in 2015-2016.° As of late 2017, astrometric observations are 
carried out with a combination of the optical path measurements by the internal 
metrology pickup and the corrections from the pupil control loop. The fractional 
performance achieved is on the order of 5 x 10~°. The extension of the metrology 
to the telescope receivers, which was still under commissioning, should improve this 
fractional performance. 


6. Conclusions and Forward Look 


This chapter has focused on the motivation, principles and practice of ground- 
based astrometric interferometry, demonstrating that ~1 micro-arcsecond precision 
is possible, but the technique still has many challenges, as the many attempts have 
shown. Recent experience has demonstrated that for sufficiently large telescopes, 
astrometric interferometry can observe astrophysically interesting and faint targets. 
The Gravity instrument for the VLTI has demonstrated routine astrometric pre- 
cisions at the level of ~50 micro-arcseconds,°® and phase-referenced imaging down 
to a magnitude mx fainter than 17.°9 The astrometric precision of Gravity for the 
60 is enabled by the ability to track on fainter stars 
(e.g. Sgr A* itself), in turn enabled by off-axis fringe tracking and phase referenc- 
ing, which is in turn enabled by the large apertures of the VLTI’s Unit Telescopes. 
Indeed, the fundamental uncertainties described in Sec. 3 are as applicable to phase- 


primary black-hole science case 


referenced imaging as they are to dual-star or multi-star astrometry. 

For ground-based instrumentation with large apertures, one of the key limiting 
effects, not described here, is the isoplanatic angle for natural guide star adaptive 
optics. Achieving sufficient sky coverage for a significantly expanded impact with a 
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ground-based system then requires laser guide stars, and possibly a multi-conjugate 
system at each telescope to increase the field of view in 8 m class telescopes to >30”. 
However, this potentially adds systematic uncertainties,°! and therefore almost cer- 
tainly requires three reference stars. Such a system would be a major undertaking, 
but as shown in Secs. 2 and 3, it could achieve a precision level of less than ~1 micro- 
arcsecond and be limited primarily by the Earth’s atmosphere. 

There are no major space-based interferometer studies at the present time, 
with the Space Interferometer Mission (SIM)° having been canceled, and the cost 
of a competitive mission based on those previous studies being in the >1B$ range. 
Nonetheless, in the context of Gaia’s anticipated exoplanet results, it may be worth- 
while reconsidering a range of space interferometer concepts in the near future. 
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Building much larger but “dilute” telescopes, for improved angular resolution, 
has been the goal of stellar interferometry for a century. Novel extended forms 
of interferometers, called hypertelescopes, are now being developed to provide 
direct images at high resolution by using many small mirror elements. The meta- 
aperture size may likely exceed a kilometer on suitable terrestrial sites, but much 
larger sizes, perhaps beyond 100,000 km, are considered feasible for space ver- 
sions in the form of a flotilla of small mirrors. The high limiting magnitude also 
attainable with hypertelescopes, similar to that of an Extremely Large Telescope 
at equal collecting area, and the much higher resolution provided by their larger 
dilute meta-aperture, are expected to produce new science inroads on various 
types of compact sources. If and when hypertelescopes are built in space, with 
mega- and perhaps gigametric meta-aperture diameter, exoplanet images may 
show enough resolved detail to search for bio-signatures such as patterns of pho- 
tosynthetic activity, through its seasonal spectroscopic modulation. Neutron stars 
and pulsars may become resolvable, as well as the morphology of active galactic 
nuclei and the black holes which they may contain. 


1. Introduction 


Following Galileo’s celestial observations with his small telescope, increasingly larger 
versions were built during the following centuries, and improved the discovery poten- 
tial. However, when their aperture size approached that of the turbulent atmospheric 
cells, typically 10-20cm, the angular resolution stopped improving, although the 
image luminosity continued to improve. Only in recent decades did the correction 
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of atmospheric turbulence become possible with adaptive optics, raising hopes of a 
breakthrough in this respect. 


2. Evolution of Stellar Interferometers 


The history of stellar interferometry dates from 1868 when H. Fizeau proposed 
to mask the entrance of a telescope with a pair of smaller apertures and observe 
the Young’s interference thus produced in the focal image. This embryonic form of 
dilute optics made spectacular progress when the spectroscopic binary star Capella 
became resolved by J. A. Anderson in 1920, at about 50 milliarcseconds spacing, 
using a Fizeau mask on the 2.5-m telescope of Mount Wilson, then the world’s 
largest. 

This initial Capella result had encouraged A. A. Michelson to increase the 
interferometric baseline, as previously suggested by Fizeau, by adding, on top of 
the same telescope, a 20-foot (6m) steel beam carrying four mirrors, in “periscopic” 
fashion. He succeeded in resolving and measuring the angular diameters of several 
supergiant stars, down to about 20 milliarcseconds. This stimulated the construction 
of the larger and self-standing 50-foot interferometer at Mt Wilson, which however 
became abandoned after the deaths of Michelson and F. G. Pease. 

Only in the 1960s were new interferometer projects initiated utilizing similar 
concepts. The use of separate small telescopes was demonstrated in 1974, with a 
pair of 25-cm telescopes, spaced 12 m apart and feeding a common coudé focus.! 
Together with its larger successor using a pair of 1.52-m telescopes mobile on a 67-m 
railway track, it paved the way toward the current systems using up to six telescopes 
at the CHARA, with baselines now reaching 330m, providing 0.2 milliarcsecond 
resolution in visible light. 

As clusters of large telescopes were planned, initially for spectroscopic observ- 
ing with a large collecting area, their interferometric coupling was proposed and 
eventually achieved in the form of the Keck interferometer, with its pair of 10-m 
mosaic telescopes, and soon thereafter the Very Large Telescope Interferometer, 
employing four 8-m telescopes on a Chilean summit. With its 100-m baselines, it 
greatly extended the resolution and luminosity performance of interferometry, even 
achieving the first detection of light from an exoplanet, in spite of the difficult 
contrast condition near its bright parent star.? 

Coarse images of a few resolved stars could be reconstructed through repeated 
observations with different baseline orientations, using an incoherent variant of the 
aperture synthesis method of image reconstruction previously developed in radio 
astronomy.* In the coming decades, direct images of even complicated sources 
should become obtainable with the numerous sub-apertures being considered for 
“Hypertelescopes.” These are an extended form of stellar interferometer, using many 
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small apertures and expected to provide direct snapshot images at very high reso- 
lution, on even very faint sources. 

In the 1960s, the indirect interferometric method demonstrated by Ref. 3, which 
they called Intensity Interferometry, had already improved the resolution with its 
baselines much longer than Michelson’s 50-feet beam, reaching several hundred 
meters. But the method proved limited to a few bright sources, despite the 6 
m light collectors used. 

Other approaches known as Heterodyne Interferometry* and Quantum Stellar 
Interferometry’ also have been considered. They do not require a direct combina- 
tion of light received from the source, through separate apertures, onto a single 
detector. On Earth, this can favor longer baselines than those of existing direct 
interferometers, perhaps exceeding 10 km. But these may also become much larger 
when operating in space, as proposed for hypertelescopes such as a 150km Exo 
Earth Imager® and a 100,000-km Neutron Star Imager.’ 

As discussed for Intensity Interferometry by Ref. 8, the theoretical comparison 
with direct interferometry indicates that the latter retains a sensitivity advantage, 
even if the former is simultaneously exploiting thousands or millions of narrow 
spectral channels. However, the method is more tolerant of mirror bumpiness in 
the collecting telescopes, so that large collectors can be used. Recent testing? with 
improved fast detectors has encouraged development of a large instrument using 
existing Cerenkov detector mirrors. 


3. From Interferometers to Hypertelescopes 


An obvious extension of Fizeau’s historic two-aperture interferometer scheme (not 
described until the 1970s) consists of using three or more apertures. If suitably co- 
phased, these provide images containing more information on the source, with the 
information content increasing as the number of apertures is incremented. This is 
easily verified by observing with the naked eye through multiple pinholes, and can 
be confirmed by computer simulations. 

An Optical Very Large Array (OVLA) combining 27 mobile telescopes of 1.5m 
was thus proposed by Ref. 10 as an extension of the two-telescope interferometers 
built at Calern Observatory. Like Michelson’s interferometric beam, these included, 
some distance upstream from the focal camera, a pair of small oblique mirrors 
which densified the pair of sub-pupils, making them appear nearly adjacent to the 
camera. Fewer interference fringes were then contained in the image’s diffractive 
envelope, and each thus received more photons. The resulting increase in observing 
sensitivity was welcome, but it was not realized then that such “pupil densifica- 
tion” could be generalized for a many-aperture Fizeau interferometer, nor that it 


’See also Chapter 3 of this Volume. 
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could provide direct high-resolution images which are intensified, thus improving 
the limiting magnitude. 


3.1. Principle of Hypertelescopic Imaging 


It nearly took another decade before the theory of such “hypertelescopic imaging” 
was explored.'! Numerical simulations, laboratory experiments, sky observations 
with miniature versions of hypertelescopes!*’!* and additional theoretical analy- 
14°17 confirmed the theory and the observational potential of large hypertele- 
scopes, particularly in space. It indicated that many small apertures, rather than 


sis 


fewer ones of larger size, would improve the imaging performance, at given total 
collecting area and meta-aperture diameter. But the high cost of the optical delay 
lines suggested an architecture concept inspired from the Arecibo radio telescope. 
It has fixed mirror elements arrayed across a natural crater-like depression. Oper- 
ating like a giant dilute mirror, they co-focus light from the observed source onto a 
suspended camera which moves to track its focal image, as shown in Fig. 11 for a 
proposed space version employing a paraboloidal flotilla of mirrors. No delay lines 
are needed: just adding a small pupil densifier element to the Fizeau-type beam 
combiner (Figs. 1-3) provides direct images, concentrating in a narrow interference 
peak most of the light captured from each point source present within a “Direct 
Imaging Field.” 


sptensified image 


rom on-axis point source Fizeau camera 


| 


Beam expanders 
densify pupil 


focus 


Dilute optics (periodic example) 


Fig. 1. Basic scheme of a hypertelescope. For clarity, the large dilute mirror is here replaced 
by a periodic segmented lens (left). The combined image at the Fizeau focus is relayed toward 
the camera (right) through a small “pupil densifier” attachment containing an array of miniature 
Galilean telescopes. These are inverted for behaving as beam expanders, thus magnifying each 
sub-pupil. The combined image which they relay onto the camera is intensified since the magnified 
sub-pupils diffract smaller lobes. 
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¢ Its image is shifted more than the envelope... 


¢ ... and eventually moves out of it => limitation of "direct imaging field" 


camera 


densifier tepped wave 


Fig. 2. Detailed view of the pupil densifier attachment, showing rays from an off-axis point source 
and the wavefront segments (fat red lines). At the densifier’s entrance, the segmented wavefront 
is flat but globally tilted. It becomes stepped, with its average slope unaffected, at the exit of 
the Galilean beam expanders, owing to their angular demagnification, not affecting the optical 
path lengths along their axis. This causes a differential offset in the camera image, with a stronger 
displacement of the interference peak relative to the diffractive envelope. The sky scale for the 
diffraction and interference functions recorded by the camera are different. 


In space, similar designs appear feasible with a flotilla of many small mirrors, 
and their meta-aperture diameter may conceivably reach many thousands of kilo- 
meters for a very high resolution.!! But small sub-apertures tend to diffract most 
transmitted light in a broad diffraction halo, much wider than the peaked inter- 
ference pattern, which is the high-resolution co-phased image of a point source. 
Outside of the interference peak, the contributing vibrations received from all sub- 
apertures are not co-phased (unless the array pattern is a periodic grid such as 
shown in Figs. 1 and 2), and the energy appearing in the peripheral speckles is lost 
from the peak. 

The loss is avoided in the hypertelescope scheme by densifying the optical aper- 
ture. It can be done, as shown in Figs. 1-3, by relaying the focal image, i.e. the 
“multi-aperture Fizeau image”, through a “pupil densifier” array of small “beam 
expanders”, such as inverted miniature Galilean telescopes, installed near the cam- 
era or eyepiece. This can strongly intensify the peak, as much as a billion times with 
the highly dilute hypertelescopes proposed for space. Most light captured from a 
star by the mirrors can indeed be concentrated in its high-resolution image. 

The cost to be paid for the intensity gain is a size reduction for the direct- 
imaging field of view (Fig. 3). Large complex sources, such as galaxies, can no longer 
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Fig. 3. Simulated direct images of a point source with a hypertelescope, illustrating the field- 
dependence of the spread function and the field-of-view limitation. The aperture is a ring of 20 
small apertures (top left), partially densified as shown below (bottom left). On top are images 
of a source located on-axis (top left), and then moved off-axis at the edge of the Direct Imaging 
Field (DIF) spanning /s (top right). The broad diffractive envelope of the on-axis image is faintly 
seen to extend beyond the DIF’s edge. It is dominated by the source’s intense interference peak, 
surrounded by its fainter rings and side-peaks. As the source moves off-axis (right), the interference 
peak also moves and becomes attenuated. The diffractive envelope also moves, but much less, owing 
to the angular demagnification of the Galilean beam expanders. The energy missing in the peak 
reappears in the central part of the envelope, in the form of a speckle pattern. If the source is 
moved further off-axis, at a celestial position outside the DIF, but still within the sub-aperture’s 
wider celestial diffraction lobe A/d, the peak becomes vanishingly faint. 


be globally imaged at full resolution. But angularly small sources such as resolvable 
individual stars, possibly having rings or a planetary system, remain efficiently 
imaged if their apparent size is unresolved by the smallest baselines contained in 
the meta-aperture. The condition defines the “Direct Imaging Field” (DIF) angle! 
as DIF = \/s = \ N‘/?2/D = X(N/A)'/?, where 2 is the wavelength, s is the 
minimal spacing of the sub-apertures, N is the number of sub-apertures and A 
the total collecting area. As an example, the DIF can contain the yellow image 
of a rather large nearby star, having a 20 milliarcsecond angular size, if the sub- 
apertures are spaced less than 5m apart. For clustered sources, the limitation can 
be somewhat avoided by arranging multiple imaging channels, each covering a DIF 
patch (Fig. 3), separated by at least A/d, where d is the sub-aperture diameter. 
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Upstream from the pupil densifier, the primary focal image, or “Fizeau image” 
of a remote point source, here assumed to be located on-axis, is a coherent super- 
position of broad Airy diffraction patterns contributed by each sub-aperture. If 
co-phased, their interference generates a narrow central peak, also resembling an 
Airy pattern with surrounding fainter rings, however becoming broken outward into 
speckles if there are many randomly located sub-apertures (Fig. 3). In Figs. 1 and 2, 
these speckles appear as a grid of periodic side peaks, owing to the periodicity of the 
aperture utilized. For a nonperiodic aperture such as shown in Fig. 3, a nonperiodic 
speckle pattern surrounds the central interference peak and the darker zone of its 
first few rings. The image transformation achieved by the pupil densifier shrinks 
the diffractive envelope, owing to the sub-pupil’s magnification (Fig. 2). It attenu- 
ates the speckles or peaks located beyond its central lobe, while intensifying those 
within it, for a conserved total energy. Most intensified is the central interference 
peak, which can concentrate most captured light if the exit pupil is fully densified. 

Such interferometers, called “hypertelescopes” , have a vast potential for extreme 
imaging performance, with resolution much beyond that of the current “Extremely 
Large Telescope” projects if meta-apertures spanning thousands of kilometers can 
be built in space. The theoretical resolution is indeed proportional to the meta- 
aperture diameter, which may reach several kilometers on Earth, and perhaps many 
thousands of kilometers in space if materialized as a controlled flotilla of mirrors. 
For a lower cost, the mirror spacing s can be made very large, without affecting 
the resolution, if the source is angularly very small, a frequent situation for some 
of the most intriguing celestial objects, such as neutron stars, pulsars, black holes 
and active galactic nuclei. 

In a Fizeau interferometer, the co-phased focal image of a point source, i.e. its 
point spread function, has a broad envelope, diffracted by the small sub-apertures. 
It contains a finer interference peak surrounded by a few Airy rings, themselves sur- 
rounded by a more intense speckle pattern. Its presence degrades the peak’s energy 
concentration, thus affecting the limiting magnitude. If the source is extended, the 
overlapping speckles degrade the contrast of the peaks unless the source is a cluster 
of points, spaced by more than the diffraction lobe’s angular diameter A/d, where 
X is the wavelength and d the sub-aperture diameter. Following Ref. 14, the lobe’s 
celestial patch is often called the “Coupled Field” (CF). With its pupil densifi- 
cation, hypertelescopic imaging solves both problems for sources smaller than the 
DIF, which is itself smaller than the CF. 

The basic optical scheme of a hypertelescope is sketched in Figs. 1-3. It can 
be architectured in the form of a multi-aperture Fizeau interferometer, having a 
dilutely segmented mirror, and also equipped with a small “pupil densifier” module, 
inserted between the Fizeau focal plane and the science camera. The pupil densifier 
is typically an array of miniature Galilean beam-expanders. Michelson’s periscopic 
arrangement of four mirrors on his 20-foot interferometer was also an embryonic 
form of pupil densifier, which had successfully enhanced the fringe’s luminosity. 
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11,14, 16,1820 indicates that the pupil densi- 
fier intensifies the image as 73, where 7yq is the sub-pupil’s magnification relative to 
their spacing of each beam-expander element (Fig. 3). At full densification, provid- 
ing adjacent exit sub-pupils, with s = d, most light collected from a point source 
becomes concentrated into the interference peak, thus improving the overall light 
efficiency. 


The theory of hypertelescopic imaging 


The DIF extent is infinite in Fizeau interferometers. But a peculiar property 
of hypertelescopes is that the DIF is in fact multiple: separated imaging channels 
can be arranged for simultaneously observing multiple sources such as a globular 
cluster, a remote galaxy, etc. These channels must be spaced at least \/d apart on 
the sky, thus more than the Coupled Field size, for exploiting nonoverlapping diffrac- 
tion lobes, each containing a DIF patch which is much smaller. In the absence of 
atmospheric turbulence, using for example adaptive optics for co-phasing the wave- 
front elements (pending space-based versions), such systems can provide direct high- 
resolution images of compact sources. The array can be expanded for improved reso- 
lution by spreading apart the mirror elements, and for luminosity either by enlarging 
them or increasing their number. The second method is preferable, since it also 
improves the dynamic range of the images due to the better sampling of the optical 
wavefront. 

The array pattern for the meta-aperture can be random or anything else, 
whether redundant or not, but some patterns are preferable for imaging certain 
types of sources. Increased sub-aperture spacings toward the edge have an apodiz- 
ing effect which attenuates the sidelobes in the image, and therefore improves its 
contrast for resolved sources. Periodic grids improve the visibility of weak exoplanets 
near their bright parent star. 

Figure 2 illustrates the stepped distortion of the densified wavefront occurring 
when a source moves off-axis. The step’s amplitude grows in proportion of the field 
angle and begins degrading the direct image when it exceeds Rayleigh’s classical 
quarterwave tolerance. It causes the field-of-view limitation for hypertelescopes. 

In terms of diffraction theory, the stepped wavefront may indeed be considered 
as a convolution of a tilted disc-shaped wavefront segment with an array of sub- 
pupil centers.'! The focal pattern Fourier transform of this stepped distribution 
of complex amplitudes is then a product of both corresponding focal patterns, i.e. 
the slightly decentered diffractive envelope, and the interference pattern, with its 
narrow central peak, which is more decentered owing to the angular demagnification 
of each pupil densifier element. Both patterns, also called the diffraction and the 
interference functions, are field-invariant, but their product is not since they have 
a different motion velocity in response to the source’s motion: owing to the angular 
demagnification of the pupil densifier elements, the peak moves faster than the 
envelope when a star crosses the field. 

As simulated in Fig. 3, the Airy-like interference peak is surrounded by rings, 
becoming broken into speckles at increasing axial distances. The ring zone, also 
called “dark zone,” 4 since its average intensity is darker than the speckle zone, has 
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a transition with it occurring at celestial angular distance \/s from the peak, if s is 
the sub-aperture spacing in the meta-aperture. This corresponds to the size of the 
DIF. 

When observing a cluster of point sources, which are typically incoherent, the 
corresponding image patterns are additive in intensity like in ordinary telescopes. 
But, unlike them, the spread function is here field-dependent, given its relative shift 
of the interference peak. The mathematical description of the resulting image is no 
longer the classical convolution of the source’s intensity distribution with a fixed 
spread function, but a pseudo-convolution of it with the field-dependent pseudo- 


spread function.'4 


3.2. Imaging Properties of Hypertelescopes 


Numerical simulations, verified with laboratory models and miniature hypertele- 
scopes on the sky, of direct snapshot imaging with a many-aperture phased hyper- 
telescope, illustrate the information gain with respect to existing interferometers 
(Fig. 4). The limiting number of resolved elements or “resels” is greatly increased, 
but also the dynamic range and the limiting magnitude, enhanced by the inten- 
sification gain resulting from pupil densification. There is also a large gain with 


Fig. 4. Numerical simulation of direct hypertelescopic imaging with a 448-segment aperture, 
patterned as an apodizing spiral and co-phased (top left). A random cluster of 448 blue, green or red 
point sources (top middle) is imaged (top right). The width of the densified sub-pupil’s diffractive 
envelope (half profile at lower right) limits the size of the Direct Imaging Field (DIF), here slightly 
smaller than the source cluster. It attenuates the image’s periphery, as seen from the missing stars 
in its corners. The yqg = 40 pupil densification factor provides 73 = 1600 intensification relative to 
a Fizeau exposure. 


136 A. Labeyrie 


respect to the incoherent form of optical aperture synthesis achieved in the recent 
decades with existing interferometers, which may be called “incoherent aperture 
synthesis” since intensity data from separate snapshot exposures are utilized by the 
algorithm.!® 


3.2.1. Field and Crowding Limitations 


As seen in Fig. 3, the hypertelescope’s direct co-phased image of a point source 
located on-axis has a dominant central peak, the interference peak, surrounded 
by fainter Airy-like rings and, further outward, a somewhat more intense speckled 
halo, replaced by a periodic array of peaks if the aperture pattern is itself periodic. 
While the central interference peak contains additive amplitude contributions from 
all sub-apertures that are co-phased, these are instead unequally phased in the 
diffractive halo, where they produce speckles if the aperture pattern is nonperiodic. 
If N sub-apertures contribute to the combined image, the resulting distribution of 
complex amplitude modulus averages N'/? and its intensity N. In the interference 
peak instead, the contributions are co-phased, giving N for the sum amplitude, and 
N? for the intensity, hence a ratio N for the peak/halo intensity ratio. 

If the source moves across the DIF, the peak also moves, together with its 
surrounding dark zone, but the halo moves yg +1 times slower (Fig. 3) owing to 
the angular demagnification of each pupil densifier element. The halo’s attenuating 
effect, as a multiplicative envelope for the interference function, eventually erases 
the peak when it reaches its edge, thus corresponding to the DIF’s edge, in which 
case the speckled halo becomes somewhat intensified since the total captured energy 
is conserved. 

If S point sources, assumed incoherent as usual for thermal sources, are in the 
sky at random positions within a “Coupled Field” patch (typically much wider 
than the DIF), the image intensity distribution I(x, y) is a sum of the contributed 
intensity patterns from all incoherent point sources. Only the sources located within 
the DIF have a dominant interference peak in their contributed pattern, as seen in 
Figs. 3 and 4. Their halos and those of the external sources overlap, and the resulting 
intensity in the DIF, locally averaged, increases as S. The individual peaks remain 
detectable amidst the summed halo if their intensity exceeds this added halo level, 
carrying some residual speckle modulation, and also photon noise, which is assumed 
negligible in the “bright” case considered here. 

The crowding limitation, expressed as the maximal number Sax of point 
sources, allowed within a Coupled Field to avoid crowding a DIF inside it may thus 
be expressed as Simax < N, and is the same as for an extended Fizeau interferometer 
having the same aperture pattern. In either case, those sources located outside the 
Coupled Field do not contribute to the image nor its degradation. 

If S sources meeting the crowding condition are concentrated within the DIF, 
with none in the peripheral part of the diffractive lobe, their average spacing is 
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A/D, implying that their interference peaks can be adjacent in the image on aver- 
age. An extended source filling the DIF, such as the resolved apparent disk of a star, 
with possibly some transiting exoplanets across it, therefore has a contrasted image 
resembling that provided by a conventional co-phased telescope with aperture diam- 
eter matching the hypertelescope’s meta-aperture diameter D. If the hypertelescope 
aperture is apodized with increased sub-aperture spacings toward the edges, such 
as the spiral pattern of Fig. 4, the dynamic range can somewhat exceed that of the 
conventional diffraction-limited telescope. 

Any additional point sources located within the Coupled Field, but not within 
the DIF, contaminate the image with a peak-less speckled halo such as seen in Fig. 3. 
High-resolution images of them can however be reconstructed with the deconvolu- 
tion algorithms of Refs. 17 and 21 and perhaps with modified versions of the CLEAN 
deconvolution algorithm exploiting the a priori knowledge of the field-dependent 
speckle pattern when piston errors are themselves measured or corrected. 


3.2.2. The Multiple Field-of-View Case 


As mentioned above, a cost to be paid for the image intensification provided by pupil 
densification is a reduction of the DIF field-of-view, usable for direct imaging. Its 
celestial diameter \/s is adjustable, as needed for matching the size of the observed 
source, if the sub-aperture spacing s itself adjustable. This may be feasible in space 
for a flotilla of mirrors, particularly if laser-trapped (Sec. 6.2). 

Unlike conventional optics, the densified dilute aperture makes direct imaging 
possible in a dilute field. A peculiar property of hypertelescopic imaging, indeed, is 
that the Direct Imaging Field is in fact multiple, with a dilute multi-patch array of 
celestial DIFs. Its celestial pitch is larger than the \/d Coupled Field, and is thus 
much wider than the diameter \/s of each DIF patch (Fig. 5). 

On clustered sources such as a globular cluster or a galaxy, a multi-patch DIF 
can be exploited with focal optics such as sketched in Fig. 5. Many stars within a 
cluster can indeed be each centered on one of the tiny field lenses within a microlens 
array. Tiltable glass plates, each attached to one of the microlenses, can provide the 
fine centering of the star’s Fizeau image needed within the closest DIF patch, much 
smaller than the microlens size. And each must feed a separate pupil densifier, 
thus producing an intensified direct image of the patch’s even smaller DIF. But the 
source itself has to be clustered, in order to avoid crowding: it can be a star cluster, 
a galaxy, etc., if its member’s size/spacing ratio ds/ss is smaller than the dilution 
ratio d/s of the hypertelescope aperture. 

The multi-field formatting scheme sketched in Figs. 5 and 6 allows, within each 
field channel, a separate correction of field aberrations such as coma and astigmatism 
originating from the primary array. For small field angles, this is achievable in 
various ways by aspherizing the lenses L2, or lenses L5 (not shown) arrayed before 
the camera to magnify each of the sub-field images. 
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Fig. 5. Example of multi-field focal optics integrated in the pupil densifier. In the Fizeau focal 
plane F1 at left, a microlens array L1, with pitch larger than the subaperture’s diffraction lobe, 
separates independent field channels. The beams from each source, here a green and a red star, cross 
a different field microlens of the L1 array, and then also different corresponding microlenses L2i in 
each of the L2i diverging arrays conjugated with the primary M1i sub-apertures. The L2i arrays 
are at the entrance of each Galilean beam expander element L2—L3 (inset at bottom right), all of 
which are attached to a dome-shaped structure. A combined image of each sub-field is co-focused 
by lens L4, on the science camera. Most of the camera pixels being located in the gaps between 
DIFs, they are unexploited unless a matching lens array magnifies the images while reducing the 
gaps. Alternately, a grism in the pupil plane can serve for recording spectra of each resel contained 
in the Directly Imaged Fields. These spectra are of interest for science, and also for calculating 
the co-phasing errors.?? 


4. Science with Hypertelescopes on Earth and in Space 


4.1. Science with Terrestrial Hypertelescopes 


Pending space versions, Earth-based hypertelescopes are expected to remain limited 
in their meta-aperture diameter by the availability of suitable concave sites and by 
the presence of atmospheric turbulence. A 10-km hypertelescope reaches in princi- 
ple 10 micro-arcsecond resolution in visible light. Pending adaptive co-phasing, it 
can initially provide reconstructed images by “speckle interferometry,”?? like con- 
ventional large telescopes.! Once equipped with adaptive optics, it should provide 
direct images. But the limiting magnitude is limited by the availability of bright 
guide stars within the atmospheric “isoplanatic patch.” It can probably be extended 
with a Laser Guide Star system, modified for hypertelescopic use.?4 

The high limiting magnitude expected with adaptive optics and a laser guide 
star should also give access to many extragalactic sources, including globular clus- 
ters, and faint cosmological galaxies. 
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Fig. 6. Multi-field hypertelescopic imaging of a green star and a red binary star in separate 
Coupled Fields. At right, the DIF images are magnified, and the wide sky gaps between them are 
cropped for pixel economy on the camera. 


4.2. Science with Space-Based Hypertelescopes 


Space obviously provides much usable room for a large flotilla of small mirrors which 
can be driven by electric propulsion, small solar sails, or laser-trapping techniques. 
Meta-apertures which can be deployed to a diameter as large as many thousand 
kilometers, are likely to greatly improve the angular resolution of the sources, their 
limiting magnitude, and the coronagraphic performance, favored by the absence of 
a turbulent atmosphere. Variants of the coronagraphic optics used on monolithic 
telescopes can indeed be adapted to hypertelescopes for difficult observations such 
as the multi-pixel imaging of exoplanets.?° 


4.2.1. Exoplanet Multi-Pixel Imaging and the Search for Bio- or 
Techno-Signatures 


At 1 pe distance, the angular spacing of a star and an Earth-like planet orbit- 
ing it reaches 1 arcsecond. Both sources can thus be located in adjacent Coupled 
Fields if the sub-apertures are larger than about 12cm. Then, adjacent channels of 
the multi-field imaging system (Figs. 5 and 6) can simultaneously provide images 
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Fig. 7. Numerical simulation of direct imaging with a 150-km hypertelescope in space, showing an 
exo-Earth at 3 parsecs. Its contrast is enhanced by subtracting a uniform level, but the parent star’s 
contamination is ignored in this simulation, and will in practice require efficient coronagraphy. 
Seasonal color changes may be searched in real images as possible evidence of photosynthetic 
life. This, together with the seasonal planktonic blooms in oceans, can be a robust and sensitive 
bio-signature. 


of both objects. But this requires, for a usable planet’s image, a highly efficient 
coronagraphic attenuation of the contaminating star light leaked toward it. 

The numerically simulated image shown in Fig. 7 does not include the star’s 
contamination, but shows that sufficient resolved detail is obtained for searching 
biosignatures, such as seasonal variations of the colored morphology expected in 
the presence of photosynthetic life.?° If the planet transits across the star’s disk, 
then only its rim appears against the disk, with also possibly a crescent of starlight 
refracted through the planets high atmosphere. Briefly appearing at the time of 
immersion and emergence, it can provide some spectroscopic information on its 
chemistry. 

Such instruments are also of obvious interest for searching technosignatures.?° 
Both related observational targets for hypertelescopic imaging may be justified 
for expanding current exobiology observing programs at radio and optical wave- 
lengths?° if the induced cost is moderate and if they are encouraged or considered 
acceptable by the established cultural communities worldwide. 

Also, the fact that powerful hypertelescopes may become available for more 
sensitive and efficient exolife searching projects suggests that similar instruments 
may already exist in the hands (or tentacles?) of putative aliens. If based on laser- 
trapped mirrors, they may be detectable from their laser light leakage. But this is 
uncertain since such leakage can likely be avoided if the mirrors have retro-reflective 
properties for the laser beams, as previously mentioned and demonstrated by the 


Novel Concepts, from Interferometers to Hypertelescopes 141 


laser recycling achieved inside gravitational wave detectors such as LIGO, VIRGO 
and their proposed LISA space version. 


4.2.1.1. Neutron stars 


The Crab Pulsar, believed to be a rotating neutron star, about 20 km in size, would 
require a 100,000km meta-aperture to resolve its morphology. 


4.2.1.2. Gravitational wave events 


The merger of two neutron stars, believed to have generated the gravitational wave 
event GW170817, detected by LIGO and VIRGO on August 17th 2017, apparently 
created a star-like source at the event’s position, with color varying from blue to 
red in a few days. If a hypertelescope had been available, it would of course have 
been of interest to produce high-resolution images. 


4.2.1.3. Detail of star morphologies within globular clusters 


At the periphery of our galaxy, globular clusters contain stars that are also resolvable 
by a hypertelescope. With the multi-field focal optics (Sec. 3.2.2), thousands of stars 
are potentially imageable simultaneously in a cluster. 


4.2.1.4. Cosmological sources 


Many of the remote galaxies listed in the Hubble Deep Field and Ultra Deep 
Field catalogs are imageable with much higher resolution by space-based hyper- 
telescopes. 


5. Architectural Concepts for Earth-Based Hypertelescopes 


The Earth’s rotation, its gravity, topography and restless turbulent atmosphere, 
are less than ideal for operating hypertelescopes, and they likely limit their meta- 
aperture diameter to a few kilometers. But terrestrial precursors can bring valuable 
operating experience toward designing space versions, and they can also produce 
useful science beyond the capabilities of ELTs and first-generation interferometers 
such as the CHARA and the VLTI. 


5.1. The Ubaye Hypertelescope Prototype 


The “Ubaye Hypertelescope” concept is developed in prototype form and tested 
since 2012 in a high valley of the Ubaye range in the southern Alps.!® The valley’s 
curvature is suitable for nesting a mirror array with meta-aperture spanning about 
200m, but having a somewhat larger physical span if extended North and South 
for broader sky coverage. 

For such size, and potentially larger ones reaching several kilometers at other 
terrestrial sites, a hypertelescope’s meta-mirror cannot be steerable, but is instead 
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preferably anchored to stable bedrock at a concave site, according to the philosophy 
of the Arecibo and FAST radio telescopes. The focal detector then has to be movable 
for tracking the drifting primary image, a requirement which may vanish for future 
hypertelescopes in space, since the micro-gravity conditions can allow the global 
pointing of a giant dilute structure, such as a laser-trapped flotilla of small mirrors 
moved by radiation pressure from laser beams (Sec. 6). 

On Earth, its diurnal rotation creates additional constraints: 


(a) Pupil drift needs to be accommodated, in which case it becomes in principle 
beneficial for the imaging performance by creating an aperture supersynthesis 
effect that improves the image contrast if long exposures are used. 

(b) If the segmented meta-mirror is paraboloidal, the steering of its axis, with 
nearly fixed mirror segments, requires micrometric actuators to continuously 
adjust their tip, tilt and piston. 


5.1.1. Optical Design 


With the fixed spherical geometry of the giant meta-mirror, or the nearly fixed 
geometry of the active paraboloidal version, no optical delay lines are needed for 
correcting the large and variable optical path differences caused by Earth rota- 
tion. This greatly simplifies the opto-mechanical architecture with respect to the 
large stellar interferometers of the previous generation, such as the VLTI and the 
CHARA. The number of sub-apertures for these instruments had been limited by 
the high cost of the delay lines and optical train, requiring for example, for the 
VLTI, 21 mirror reflections from star to detector. 


5.1.1.1. From a spherical to an active paraboloidal primary array 


The 300-m Arecibo radiotelescope has a fixed spherical 330-m mirror nested in a 
natural sink-hole in the Puerto-Rican hills. It focuses the source’s wavefront onto 
a detector suspended above, and movable along, the focal sphere, together with 
a large aspheric secondary mirror correcting the spherical aberration. The initial 
gondola of the Ubaye Hypertelescope, built for a spherical meta-aperture, before 
its active parabolization was tested, was similarly equipped with a Mertz-type focal 
corrector. 

The FAST radiotelescope, recently built in China, is similar, although larger 
with its 500-m mirror, but has an active paraboloidal primary mirror, with no 
need for an aberration corrector when observing on-axis sources. It is not globally 
steered; only the fourth-order term of its profile needs to be varied in order to steer 
the paraboloid’s revolution axis about its curvature center C), and this is a rather 
small adjustment of the segments. Both approaches are of interest for tolerating the 
Earth’s rotation without also globally steering the large meta-mirror. 

For optical versions, the much higher areal cost of the more accurate mirror 
segments needed favors a dilute array of many small such segments, feeding a focal 
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beam combiner and pupil densifier. If the segments are small enough, their shape 
can remain invariant while they are slightly re-adjusted in tip, tilt and piston by a 
triplet of micrometric actuators, coordinated for rotating the paraboloid’s revolution 
axis about the curvature center C;. The active meta-paraboloid, with M, mirror 
elements thus sufficiently small to be controlled only in terms of their piston and 
tip-tilt, requires no global steering and is therefore nearly fixed. 

In gravity-free space, the global steering of a hypertelescope flotilla may be 
easier than on Earth, and this can favor a paraboloidal meta-mirror. 


5.1.1.2. Pupil drift and its accommodation 


A complication arising with Arecibo-like fixed meta-mirrors on Earth, feeding a 
moving focal gondola, is the relative drift of the relayed aperture, i.e. the meta- 
pupil image relayed at the entrance of the dome-shaped pupil-densifier element 
(Fig. 5). It can be accommodated if this dome is shaped like a Gregorian telescope’s 
secondary mirror Mg,, concave and confocal in F; with the primary meta-mirror 
M,. Then, a homothetic image of M1 can be relayed onto Mg, by the field lens L 
inserted at F,, where the center of the homothetic projection is also located. As a 
star’s image focused by M, is moved on its focal surface by Earth’s rotation, along 
a great circle, it can be tracked by Me, if it similarly follows the same circle in 
a curvilinear translation, thus keeping a constant attitude. Then the pupil image 
remains fixed on Mg,, owing to the homothetic projection. 

Me, is not in fact utilized as a mirror, but is modified for use as the pupil 
densifier by drilling a small hole (Fig. 5) at the exact position of each sub-pupil, 
for inserting the corresponding pupil densifier element. Their axes intersect at the 
curvature center Cg, of Mg,. Accommodating the pupil drift is then a matter of 
accurately driving the densifier’s curvilinear translation. 

Downstream from the pupil densifier, forming a combined image of the source 
and tracking it on the science camera also requires a lens group attached to it and 
which rotates about Cj , the curvature center of Mj, in order to remain aligned 
with it and the sky position of the source observed. This group contains the field 
lens Ly, a larger beam-combining lens Ly, optional additional components such 
as a magnifier lens, a dispersive grism, a microlens array, and the science camera 
(Fig. 5). 

The required motion of this group, relative to the densifier, is an equatorial 
rotation about C@,. It can be generated by a small motorized equatorial fork-type 
drive, polar aligned with its diurnal axis parallel to Earth’s polar axis (Fig. 8). Its 
base follows the curvilinear motion, and carries the densifier rigidly attached to it, 
and thus keeping a constant attitude. Its fork carries the optical group and camera. 
Small commercial equatorial mounts, such as the Celestron NexStar SE4 or the 
Meade ETX 90, have angular encoding and a computer drive that can provide the 
required tracking of the celestial source, with the possible assistance of a second 
auto-guider camera for high accuracy. 
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Fig. 8. Miniature equatorial mount within the focal gondola for tracking the star and the drifting 
pupil. The pupil densifier dome is kept at a fixed tri-axial attitude, using three actuators (or at 
least one if a counterweight passively maintains its verticality) for pupil drift accommodation, and 
is rigidly attached to the mount’s baseplate. The remaining optical train is oriented by the mount 
for tracking the celestial direction of the observed source. The mount’s elements are individually 
balanced, accurately, for minimal motor assistance. 


5.2. Alignment Sensors and Techniques 


State-of-the-art instruments of topographic surveying, such as lidar scanners and 
automated theodolite/telemeters, can likely be adapted to the co-parabolization 
adjustments of hypertelescope mirrors, ideally requiring sub-micron accuracy. 

The simpler instruments used, as described below, for the coarse tip—tilt and 
piston adjustments of the mirrors are also adequate for the initial positioning and 
occasional verifications, particularly following seismic events. The fine adjustments 
use the same micro-metric actuators under computer control, with error signals 
sensed by the science camera. 


5.2.1. Tip-Tilt Adjustment of Primary Mirror Seqments 


The coarse tip-tilt adjustment of a primary mirror segment is achievable by 
installing at the edge of it a long-focus camera incorporating a semi-reflective corner- 
cube mirror (Fig. 9) to verify the symmetry of the star and focal gondola’s directions 
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Fig. 9. Tip—tilt adjustment of a primary mirror element M, relative to the focal beam-combining 
optics, using a small alignment telescope adjacent to the mirror, with a hollow corner-cube retrore- 
flector, one face of which is a beam splitter. It tracks directly the observed star, and indirectly 
through the corner cube and the mirror an LED (green) attached to the focal gondola. The spacing 
of both focused spots seen in the eyepiece or camera indicates the tip-tilt error of the mirror. 


relative to the segment’s face. Initially, a first mirror serving for reference can be 
adjusted with respect to the focal gondola, itself pre-positioned at a reference posi- 
tion, for example that of the observed star’s image at transit time. The reference 
mirror’s tip-tilt actuators are quickly adjusted for ensuring the star’s focusing at 
the gondola’s entrance. The gondola’s star-tracking motion is then initiated, and 
its accuracy periodically verified in the alignment camera while all other mirrors 
are also being aligned. This is achievable sequentially if another alignment camera 
can be moved adjacent to each of them, or in parallel if each has its own assigned 
camera. Depending on the stability of the bedrock and mirror’s supporting tripods, 
the coarse adjustment thus obtained can be expected to remain valid for days or 
weeks. 
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Fine tip-tilt adjustments are achievable with Shack—Hartmann techniques, 
using a star acquisition camera located in the focal optics in addition to the science 
camera. 


5.2.2. Coarse and Fine Piston Sensing 


The coarse piston sensing is achievable at ground level using a theodolite with laser 
telemeter, located on a rigid tripod amidst those carrying the mirror segments, for 
scanning all mirrors and providing 2, y, z positions of their centers (Fig. 10). 

For the fine piston corrections, the error signal can be provided by the science 
camera itself if its display of the star’s combined image is formatted in a multi- 
spectral mode, for example with spectra of each high-resolution speckle. The piston 
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Fig. 10. Sextant-like alignment telescope used for coarse measurements of piston errors among 
the primary mirror segments. In daytime, a fixed target overhead, such as the suspended focal 
gondola, is seen by the 50 mm aiming telescope through the semi-transparent mirror, at the same 
time as a marker on the M, mirror element being measured. Both images are made to coincide 
accurately by tilting one of the sextant mirrors. Its angle reading, ideally with arcsecond accuracy, 
gives the mirror’s relative height, once its distance has also been measured with a laser telemeter 
(not shown). At night, the star to be observed is also usable as the target, using accurate timing. 
The tripod legs are thermally insulated for tolerable errors caused by thermal expansion. The 
submillimeter accuracy reachable for the piston map, across tens of meters, is comparable to that 
of commercial surveying instruments. 
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errors can indeed be extracted by calculating the three-dimensional Fourier trans- 
form of its (x, y, A~)) pattern.?? 

An alternate method, applicable if M,’s curvature center C is accessible, for 
example with a stabilized drone, involves techniques such as Hartmann sensing for 
tip-tilt, and the analysis of white-light Young’s fringes for piston sensing. 


5.3. Testing Initiated with the Ubaye Hypertelescope Prototype 


The system under test is described in more detail by Ref. 16. A major result is the 
millimetric accuracy demonstrated for driving the focal gondola while tracking the 
image of the observed star. 


5.3.1. Driving the Focal Gondola, Suspended from a Crossing Cable 


The focal gondola is suspended 101m above the mirror segments, from a pair of 
pulleys freely rolling on the carrier cable. This cable, made of aramid fiber and 6mm 
in diameter, crosses the Moutiére valley, which is oriented East—West, in the approx- 
imate North-South direction. Its length is 800m between the attachment points, 
one of which has a pulley so that the cable’s height and tension be adjustable by an 
electric winch at an easily accessible location. Also, a counterweight is installed near 
the winch for maintaining a constant cable tension, about 60 kg, when observing. 

For tracking the star’s image, the gondola is driven by a triplet of 2-mm high- 
modulus oblique cables pulled by small computerized winches respectively located 
about 200 m South, North-East and North-West. The carrier cable pendulates from 
West to East to accommodate the motion of the gondola, while it also rolls on 
the cable. The resulting tracking accuracy obtained is about 1mm, meeting the 
specifications. 

A second triplet of oblique cables, similarly driven by winches adjacent to the 
former ones, define the gondola’s attitude, which is critical for centering each sub- 
pupil in the corresponding pupil-densifier element. 

Also controlled, for a fixed attitude, is the base of the small equatorial mount 
inside the gondola (Fig. 8), as well as its hour angle and declination. The gondola 
therefore has 6+38-+2 = 11 mechanical degrees of freedom which must be controlled 
for its correct optical alignment relative to the primary meta-mirror, assumed fixed 
and perfectly co-spherized or co-parabolized. These are: 


(A) A curvilinear translation of the field aperture for tracking the star’s image along 
the focal path: the entrance field aperture of the focal optics must track the 
focal image F of the observed source along its curved trajectory, a small circle 
of the focal sphere, which is concentric with the primary MM, meta-sphere and 
has half its size. As demonstrated with the Ubaye Hypertelescope prototype, 
this can be done by driving the suspended focal gondola with three oblique 
cables, controlled by computerized winches at their lower end. Two can suffice 
if the gondola’s rolling suspension from the pendulating cable which crosses 
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the valley at 101m height is motorized. A field acquisition camera serves as the 
autoguiding sensor to verify the presence and focus of the star’s image, as well 
as its centering. 

(B) Stabilized tri-axial orientation of the densifier dome, about its curvature center, 
together with the attached base of the small equatorial mount. Their attitude 
must be kept fixed with respect to the ground. If the carrier cable is replaced 
by a drone, its attitude control system can serve the same purpose if accurate 
enough, or a fine adjustment stage such as a hexapod can be added between 
the drone and the equatorial mount. 

(C) Bi-axial equatorial orientation of the main optical bench. The equatorial mount 
is adjusted in terms of its declination setting, kept fixed while observing a given 
source, and its variable hour angle. An autoguider camera, looking upwards, 
can sense the alignment of the optical bench toward the observed source. But 
the science camera can also provide related information on the pupil centering. 
Indeed, LED beacons attached to one or several /, mirrors are imaged at the 
entrances of the corresponding pupil densifier elements on the dome, and the 
multiple field on the science camera is uniformly illuminated with LED light if 
the pupil is correctly aligned. Alignment errors generate shadows, which can be 
analyzed in terms of error signals. Indeed, if an LED-illuminated pupil densifier 
element is transversally offset by one period of the entrance lens array, then 
one sub-field image is obscured at the edge. Interpolating and maximizing the 
photometric camera signal then centers the pupil. A pupil camera can be added 
for more direct feedback, with a few LED-illuminated fibers at the edge of pupil 
densifier elements to help centering the sub-pupils. 


Conceivably, eight oblique wires tensioned by computerized winches and the 
near vertical lift force applied by the suspension cable can control these degrees 
of freedom. Eight computerized winches at the ground ends are then needed. But 
some of the winches and motors can also be installed on the gondola, if appropriately 
powered and computerized. This can simplify the control system and provide a faster 
response. And it will obviously be needed if and when cable-free gondolas become 
operated with drones. 


5.4. Adaptive Co-Phasing on Earth and in Space 
5.4.1. Wave Sensing for Adaptive Co-Phasing 


Whether on Earth or in space, where the wave disturbances are much slower in the 
absence of a turbulent atmosphere, this is conveniently achievable by analyzing in 
real time the live image captured by the science camera, if suitably dispersed in a 
spectro-imaging format. Such spectro-imaging, which is highly valuable for the sci- 
ence produced, can indeed also feed tri-dimensional Fourier transform calculations 
of the I(a, y, 1/X) exposures, from which the wave errors can be mapped for the 


adaptive co-phasing.??°27 
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6. Hypertelescopes in Space 


Above the Earth’s turbulent atmosphere, light from distant astronomical sources is 
little affected by the weak residual inhomogeneities of the interstellar propagation 
medium, including solar protons, dust and gravitational effects. The high imaging 
performance of hypertelescopes, particularly in terms of angular resolution, but 
also for accessing ultra-violet and infra-red wavelengths, can therefore be preserved, 
especially if adaptive co-phasing techniques are used to correct slowly-varying wave- 
front errors, possibly resulting from local micro-gravity fluctuations or gravitational 
lensing effects. 


Early optical concepts for interferometers in space?® 


involved a flotilla of mir- 
rors, each driven by small thrusters, whether chemical or ionic, or small solar sails. 
When the theory of hypertelescopic imaging indicated that smaller mirrors, at con- 
stant total collecting area and meta-aperture size, provide better imaging perfor- 
mance, flotillas of numerous small mirrors were considered. The earlier concept of 
29,30 appeared adaptable for herding such small mirrors with 
sub-wavelength accuracy, according to the principle of “laser tweezers.” 


laser-trapped mirrors 


6.1. Moon-Based Options 


The hypertelescope concept tested at Ubaye appears compatible for replicating 
in a lunar crater. Since the Moon’s equator is nearly parallel to the ecliptic, a 
hypertelescope located on the equator has its zenith near the ecliptic, and naturally 
scans it during a lunar period of 27 days. Similar scanning is possible along celestial 
small circles, parallel to the ecliptic, by offsetting the observing direction. This is 
possible by changing the lunar declination of the focal gondola, or the lunar latitude 
of the site. For broad sky coverage, a crater site at mid-latitude, with a concave 
topography allowing extensions of the meta-mirror toward the North and South, is 
of interest. 

The vast Aitken basin, near the South pole, appears to contain numerous possi- 
ble sites with deep craters spanning tens to hundreds of kilometers, and high points 
where cables can be attached for suspending one or more cameras. 

The proposed Lunar Space Elevator, with its long cable connecting the Moon 
to its L, Lagrange point toward the Earth, may provide useful possibilities for 
suspending the camera. 


6.2. Space-Based Options with a Flotilla of Mirrors: Proposed 
Schemes Including the “Laser Trapped Hypertelescope Flotilla” 


Since proposed,”° the concept of building a giant mirror by trapping a thin pellicle in 
standing waves of laser light has been further calculated and evaluated two decades 
later, including by a NASA/NIAC study.°° Following the emergence of the hyperte- 
lescope principle! and its proposed implementation in space with a controlled fleet 
of mirrors, it took another decade to realize that it could be compatible with the 
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Beam-splitter 


Fig. 11. Scheme of a “Laser Trapped Hypertelescope Flotilla”. Thousands of small mirrors like 
that shown below are trapped (curved string of white dots) by radiation pressure within standing 
waves of interference, a stratified light structure produced by pairs of coherent laser beams (red) 
received on each face. A beam-splitter (thin white dotted line at center) splits the laser beam, and 
a microlens array near it fans the pairs of beams. The trapped mirrors are semi-transparent at the 
aser wavelength, which is repeatedly shifted from red to blue in order to push the mirror toward a 
central interference fringe. Its global paraboloidal figure accurately shapes the meta-mirror. Light 
from the observed source (yellow) is co-focused by the mirrors toward the science camera (top 
right). A possible site in space is the L2 Lagrange point of Sun—Earth (inset), partially shaded 
from the Sun. 


laser-trapping scheme, and that this would avoid the problematic presence of stress 
in large membranes. Thus emerged the concept of a “Laser-Trapped Hypertelescope 
Flotilla”! (Fig. 11). 

Producing the paraboloidal locus where the central “white” standing wave 
traps the mirrors is in principle achievable through the interference of two counter- 
propagating wavefronts, respectively spherical, diverging from the parabola’s focus, 
and flat. For easier fabrication, both can be somewhat defocused by the small diverg- 
ing mirrors seen in Fig. 12; the curvature of the diverging wave can be slightly 
increased to make room for the science camera in the focal zone; and the flat wave 
can preferably become slightly diverging, so that no giant collimator is needed to 
produce it. These adjustments can be balanced to preserve the paraboloidal standing 
wave. 
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Fig. 12. Transverse and angular trapping of a small mirror in standing waves from a pair of 
counter-propagating laser beams (red). The thin mirror, reflective at the observed wavelengths of 
the source (yellow), instead behaves as a 50/50 beam splitter at the offset laser wavelength. In 
addition to the axial trapping, the transverse trapping is much less demanding in accuracy, and 
is achieved by the light deviation in the mirror’s peripheral circular grating becoming illuminated 
by the decentered laser beams if their diameter matches the central mirror. The angular trapping 
for stable tip—tilt is also achieved by the partial laser reflection on the grating’s oblique facets. 


A microlens array is also needed near the beam splitter to concentrate the laser 
light into fanned narrow beams matching the membrane mirror’s diameter as they 
reach them. For additional saving of laser power, and reducing light pollution, the 
trapped mirrors can likely be made retro-reflective for the laser light, in order to 
recycle it. 


6.3. Hierarchical Beam Combining for Smaller Mirrors in 
Extremely Large Hypertelescopes 


How large can a hypertelescope fleet or flotilla be made in space? For a laser- 
trapped flotilla of many mirrors, the cost may be dominated by that of the laser, 
the primary mirrors and those of the collecting mirrors in the focal optics. The 
field-collecting mirror at the entrance F, of the focal combiner has to match the 
size of the co-focused diffraction lobes which it collects from each sub-aperture Mj;. 
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The diameter d, of the many primary mirror segments M/); determines the diameter 
of the field-collecting mirror at the entrance F\ of the focal combiner. Indeed, for 
efficient light collection, this field mirror has to match the size of the superposed 
Airy disks diffracted by each primary mirror. And it must be larger, but possibly 
in mosaic form, if a multi-field arrangement is used. 

If the cost of mirror elements, assumed to dominate the system’s cost if it has 
many of them, varies as d*, where the value of k typically approaches 3, then the 
total cost factor for N, Mj, mirrors and the field collecting mirror is C = Nd +.Ak = 
Nd¥ + (Afi)*d-*. Its derivative with respect to d is 


dC/dd = Nka*-Y — kf), 


which cancels if N = (Af) kd‘—?*) k(-24), 

It thus defines an optimal size dop_ for the primary mirror segments, or equiv- 
alently dope = {k-?") (Afi) /N}O/24) (N/(Afi)*) 1/2"). Typical values ensuring 
a minimal mirror cost factor C for a large hypertelescope flotilla with f, = 10° 
km, for a comparable flotilla diameter D,, containing N = 10° mirrors, at 
A = 500nm, are dj = 0.45m, A, = 1.12km. If the focal length is instead reduced 
to fy = D, = 104 km, other things being equal, then the cost factor is C = 22,361 
for an optimal d = 0.22m, providing a total Mj, area 1 km? and a focal collector 
diameter 22m. 

According to hypertelescope theory, for a given collecting area Nd’ and resolu- 
tion Af; /D1, smaller mirror elements provide higher imaging performance, and the 
cost can be lower according to the previous calculation. It can be further reduced by 
adopting a hierarchical arrangement with several layers of beam combining mirrors, 
as shown in Fig. 13. 


6.3.1. Fiber or Periscope-Coupled Hypertelescopes 


Fiber-coupled hypertelescopes have been proposed for ground and space versions, 
and their operation could be verified in the laboratory.!° But fibers in space raise 
questions regarding their handling, their vulnerability to micrometeorite impacts, 
and the emission of contaminating Cerenkov light within them under cosmic-ray 
bombardment. An alternate option for replacing them is the classical periscopic 
optics used in submarines, but long periscope tubes are unlikely to be deployed in 
space. 

Several hypertelescopes can be coupled in parallel for increased performance, 
their beam-combining optics can be cascaded hierarchically as shown in Fig. 13 to 
avoid using large combining mirrors. The hierarchical arrangement thus obtainable 
can be of interest for very large hypertelescopes, such as the proposed “Neutron 
Star Image” spanning 100,000km’. Indeed, the primary mirrors in the M, flotilla 
meta-aperture would need to be as large as 10 or 20m for directly co-focusing sub- 
aperture diffractive lobes of similar size onto the beam-combining focal spaceship, 
located at least 100,000 km away for a workable focal ratio. Instead, much smaller 
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Fig. 13. Hierarchical beam-combining scheme for using smaller mirrors in a large space hyperte- 
lescope flotilla. The primary My, array (left) can be separated in sub-array groups, each of which 
co-focuses the source’s light (red) to the entrance of a separate focal concentrator (detail in inset). 
A second layer of concentrators is shown at right, and more can be added to further reduce the 
mirror sizes needed, while increasing their number for a constant collecting area. The concentrator 
shown in detail at lower right is equipped for multi-field imaging. Independent field channels, here 
shown with a red and a green star, are separated by a microlens array in the Fizeau focal plane. 
An array of Galilean beam expanders achieves the pupil densification. 


M;, mirrors, with more of them for a comparable collecting area, can focus a smaller 
lobe toward a more proximate spaceship if it can relayed to a hierarchy of similar 
concentrating combiners. 

As shown in Fig. 13, the primary paraboloidal, or spherical, flotilla of mirrors 
can indeed be segmented in groups co-focusing the light collected from a common 
source to secondary separate beam combiner spaceships. According to a hierarchical 
arrangement, these can in turn feed their captured light to separate beam combiners, 
each equipped with a pupil densifier. This can provide a wide meta-pupil for a 
reduced diffractive divergence of the combined beam, and hence a narrow combined 
Airy spot toward another layer of beam combiners. 

The Crab Pulsar, believed to be a rotating neutron star, about 20km 
in size, would require a 100,000-km meta-aperture for angularly resolving its 
morphology. 
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6.3.2. Size Limits for Large Hypertelescopes in Space 


Is this within the foreseeable limits of hypertelescope technology? If the giant 
segmented telescope has a f/1 global focal ratio, then the diameters needed are 
d, = 8.5m for the MM, primary. The hierarchical scheme favors a large number of 
small primary mirrors while keeping the downstream optics within acceptable size 
limits. It can be of mirror segments, and also for the M2 field mirror collecting the 
co-focused diffraction lobes at the F, primary focus. Such large mirrors minimize 
the diffractive losses by matching Mo’s size dz with that of the diffraction lobes co- 
focused on it. The condition Af; /d; = dz, where A is the wavelength, indeed implies 
minimal sizing if d, = d, = (Af;)!/?. Their high cost and mass are, however, 
objectionable. Much smaller mirrors (but more of them if the total collecting area 
is preserved) can replace them if beam-compression options are adopted: 


A. A hierarchical beam combiner can be arranged as shown in Fig. 13. The primary 
flotilla of mirrors is divided in groups, each of which co-focuses its beams toward 
a separate concentrator spaceship, containing a collecting mirror and a pupil 
densifier. Additional layers of similar concentrators further co-focus the groups 
of beams from the previous layer, until all become focused on the science camera 
inside a final spaceship. 

B. Alternately, fiber-optical connections can in principle be established between 
concave mirrors and the focal optics.°? But this raises unusual challenges, for 
fibers as long as 100,000 km in space, one of which is the intra-fiber emission of 
contaminating Cerenkov light caused by the cosmic-ray bombardment. It can 
conceivably be attenuated if each fiber is replaced by a periscopic tube, with 
lenses periodically arranged inside it, but the prospect of such long tubes is also 
challenging. 


Option A appears preferable. 

A fixed “bubble” flotilla, with many mirrors on a spherical locus, can provide 
sequential coverage of the full sky if it contains focal sensors that can move along 
its half-size focal sphere. Active parabolization may also be considered. 


7. Conclusions and Future Work 


Following the encouraging observational results of existing stellar interferometers, 
with the modest number of sub-apertures which they use, the feasibility of major 
improvements to their imaging performance and resolution, through the hypertele- 
scopic approach, is becoming verified. With their many sub-apertures and efficient 
light concentration in direct images at high resolution, hypertelescopes can achieve 
much progress toward a better understanding of various celestial objects. Life may 
become detectable on exo-planets from seasonal variations of their colored surface 
patterns. The physics of neutron stars, pulsars, and gravitational wave emissions 
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and black holes may be clarified, and the faint cosmological galaxies at the limit of 
the known universe may become seen in better detail. 

With their flexible modular structure and possible progressive growth, hyper- 
telescopes may also prove more economical than conventional large telescopes of 
similar capability. 

The development of the Ubaye Hypertelescope prototype, initiated in 2011 in 
a high valley of the southern Alps, has demonstrated that the needed millimetric 
accuracy can be attained for positioning the focal gondola and driving it to track 
the star’s image. What is needed now is the selection of a suitable terrestrial site for 
a science version, among several considered in Chile and in the Himalaya, to which 
part of the Ubaye system can be moved. Also needed is testing at a Lunar site, 
if manned or robotized activity is initiated at locations such as its Aitken basin. 
And, following the preliminary testing achieved in space by the pair of PRISMA 
spaceships (https: //prisma-ffiord.cnes.fr/en/PRISMA/index.htm), more testing of 
optical flotilla concepts is needed. 
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Key technologies to realize visible imagers are reviewed, including wide-field optics 
and CCD detectors. Current mosaic CCD imagers are summarized, and alignment 
techniques and dewar designs are discussed. 


1. Introduction 


Imaging in the visible wavelength has the longest history in the field of astronomy, 
and thus it has been providing fundamental data sets, including catalogs of stars 
and galaxies, for the past several centuries. Visible imaging still plays key roles 
in frontier research, such as observational cosmology to probe the nature of dark 
energy. This is because wide-field cameras on large telescopes are now possible 
owing to advances of the opto-mechanics and semiconductor technologies. Using 
these cameras, various imaging surveys are being carried out or planned to explore 
the universe in unprecedented width and depth. In this chapter, we introduce how 
the visible imaging system is organized and present examples of the imagers. 


2. Wide-Field Optics 
2.1. Schmidt Telescope 


One of the most well-known wide-field imaging facilities is a “Schmidt telescope”. 
The Palomar Sky Survey (POSS) was carried out using the 1.3m Schmidt telescope 
at Mount Palomar in California and UK Schmidt telescope in Australia. The photo- 
graphic plates and their digitized images became indispensable resources for many 
kinds of astronomy research. 

The idea to realize fine image quality over a very wide field (~6°) is the introduc- 
tion of a spherical primary mirror and a spherical convex prime focus, in which both 
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Fig. 1. Schmidt telescope design. The mirror and focal surface are both spherical, and a “Schmidt 
plate” is required at the entrance aperture to eliminate spherical aberration. 


spheres share a common center of curvature (Fig. 1). An aperture stop is placed at 
the center of the sphere. Under this configuration, off-axis light sees the two spheres 
in the same manner as the on-axis light, which eliminates off-axis aberrations (coma 
and astigmatism) completely. The rest of aberrations are spherical aberration and 
field curvature, where the former is corrected by a transmissive fourth-order aspheric 
plate located at the curvature center. The field curvature is usually allowed since 
it is easy to warp the photographic plate, but one may introduce a single-lens field 
flattener just in front of the focal plane if necessary. The size of the aspheric plate 
limits the aperture of the telescope (D < 1m). 


2.2. Prime Focus of Ritchey—Chrétien Telescope 


Large telescopes of ~4m diameter aperture usually use a Ritchey—Chrétien (R-C) 
design, where the pair of the hyperbolic primary and the (convex) hyperbolic sec- 
ondary eliminate the spherical and third-order coma aberrations. The prime focus 
has the shortest focal length and thus could provide the widest field of view (FOV). 
It suffers, however, from all kinds of aberrations even at the optical center. Wynne 
worked out a three-lens corrector for the R-C hyperboloid mirror.' This can be 
regarded as a modified version of the two-lens corrector for parabolic primaries 
designed by Ross.? Figure 2 shows the optical layout of Wynne’s triplet designed 
for the Kitt Peak National Observatory 150-inch telescope. All surfaces are spherical 
and only UBK7 is used for glass material. By correcting the spherical aberration, 
coma, astigmatism and field distortion, the image spread, s, is smaller than 0.5” 
within a 30 arcminute diameter FOV, with s < 1.0” for a 50 arcminute diameter 
FOV, although the wavelength coverage is relatively narrow (400-500 nm) compared 
with today’s standard. 
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Fig. 2. Wynne’s triplet design for the Kitt Peak 150-inch telescope. 


= 


(a) 


Fig. 3. Prime focus corrector design for Subaru Telescope. (a) FPL51 design for Subaru. (b) Final 
design with the lateral shift ADC. 


Even larger telescopes (8-10m) typically have faster primary mirrors, which 
makes the design of the corrector challenging. There are also increasing demands 
on the wavelength coverage, which calls for the introduction of an atmospheric 
dispersion corrector (ADC).* Reference 3 discussed a prime focus corrector for the 
10 m Keck telescope (f/1.75) by adding a pair of direct vision prisms as ADCs 
between the second and third elements of the triplet. The design realized 80 percent 
encircled energy diameter (Dgo) of 0.25” over a 30 arcminute diameter FOV for the 
wide wavelength range of 0.33 to 1m. Unfortunately, the prime focus plan was 
dropped from the Keck project and the corrector was not manufactured. 

Reference 4 designed the corrector for the f/2 primary focus of the Subaru 
telescope with a smaller image diameter of 0.2’, where they carefully determined 
the position of three lenses to balance the chromatic aberration of the spherical 
aberration and the imaging at the edge of the field (Fig. 3‘a)). They proposed 
several variants of the design to adopt for changes of the telescope design, and 
ended up with a design that employed fluorine phosphor glass (OHARA FPL51) 
for the triplet to minimize the chromatic aberrations of spherical aberration.° 


See Volume 2, Chapter 5 for a more detailed discussion of ADCs. 
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Although the image quality satisfied the specification of 0.2”, manufacturability 
of the large (D ~ 0.65m) FPL51 lens was questionable. In order to reduce the risks, 
Ref. 6 refined the design by replacing the first two FPL51 glasses with OHARA BSL7 
(Schott BK7 equivalent), which is easier to manufacture. The obvious drawback 
was the increase of chromatic aberrations, but this was reduced by relocating the 
first lens closer to the focal plane. This also reduced the size of the lens down to 
D = 0.5m. In addition, Ref. 6 introduced a novel lateral-shift ADC (Fig. 3(b)). 
It consists of two cemented lens where the first and the last surfaces are flat and 
the second is spherical. The idea is that the decentered spherical surface acts as 
an ADC. This additional surface is also used for the image optimization, whereas 
the traditional direct vision prism does not have such an advantage. Because the 
ADC itself generates chromatic aberration, the compensator, which is made of a 
pair of glass elements, is inserted right after the ADC. Therefore, the number of 
glass elements became the same as the design that adopts prisms for ADC. The 
corrector was actually manufactured by Canon and has been used for the Subaru 
prime focus with Suprime-Cam since 1999 (see Sec. 4.1). The camera features a 
unique combination of an 8m aperture, wide 30 arcminute FOV, and seeing-limited 
imaging at Mauna Kea (median seeing of 0.65 arcseconds in i-band). 

In the early 2000s, the demand on the field size had become even higher to 
carry out large-scale imaging surveys (>1000deg?) motivated by the discovery of 
the accelerating expansion of the universe. Thus, as a successor of Suprime-Cam, 
a new wider-field camera was planned, which is now called Hyper Suprime-Cam 
(HSC). Reference 7 presented the initial design of a 2°-diameter HSC corrector, 
which employed only fused silica and BSL7 because the manufacturability of these 
glasses were secured. The design satisfied the image specification of Dgo < 0.3” over 
the entire 2° FOV for wavelengths above 600 nm, where high image quality is crucial. 
In the final design, the FOV became 1.5° diameter and an ADC was incorporated. 
The design is shown in Fig. 4, where we note that the layout is exactly the same 
as original Subaru prime focus: i.e. Wynne triplet + the lateral shift ADC + the 
chromatic compensator. It was also manufactured by Canon.® The final design spec- 
ification was Deo < 0.2”, and the final spec including manufacturing and assembling 
errors was Dgo < 0.3” (FWHM < 0.2” in the Gaussian approximation). Judging 
from the on-axis measurement at the factory, combined with the off-axis images 
taken at the telescope, the specifications turned out to be met, which enables seeing- 
limited imaging. 

Another interesting example of wide field facilities is the SDSS telescope? and 
its camera.!? To make the survey efficiency high, they adopted Time-Delay-and- 
Integrate (TDI) imaging where the charge shifts on the CCDs are synchronized 
with sidereal motion of the sky and the image is read out continuously. This means 
that the image distortion must be minimized, whereas we usually do not care so 
much about the distortion on standard astronomical cameras. They found that the 
2.5m diameter f/5 R-C telescope combined with two highly aspheric corrector lenses 
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Fig. 4. Layout of Hyper Suprime-Cam Corrector. 


Fig. 5. Paul—Baker three-mirror telescope. 


could realize small enough distortion of < 12m (0.2). The overall aberration is 
better than 0.6” over the 3 degree FOV, which is sufficient for the site at Apache 
Point Observatory. 


2.3. Three Mirror Telescope 


When a third mirror is employed, the additional degree of freedom allows the 
designer to reduce the aberration of the system. Let’s consider first a Cassegrain 
afocal telescope consisting of a concave primary and a convex secondary. The two 
mirrors are confocal. This system acts as a beam reducer with zero spherical aber- 
ration, coma and astigmatism. When the third, spherical, mirror is added in the 
collimated beam, an image is formed (Fig. 5), but with spherical aberration. The 
secondary can be changed to a spherical mirror so that it generates a spherical aber- 
ration of the opposite sign, which eliminates the spherical aberration of the system. 
Reference 11 first noted that the system is also free of coma and astigmatism, even 
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after altering the secondary from a parabola to a sphere, if one adopts a spherical 
focal surface whose curvature is twice that of the primary. The three-mirror sys- 
tem can be regarded as a kind of Schmidt camera in which the primary and the 
secondary function as the corrector plate for the third (spherical) mirror. Because 
Baker rediscovered the design independently, this three-mirror system is now called 
the Paul-Baker (P-B) design. 

Although the P-B telescope has excellent image quality, it did not become a 
main trend in telescope design partly because of the limited space for a camera. The 
design also has relatively large vignetting. However, Ref. 12 reconsidered the P—B 
telescope as the most promising design to realize a very wide (>3°) FOV on large 
(~8 m) telescopes. By allowing asphericity for all three mirrors and adding a three- 
lens corrector in front of the focal plane, they realized Dgg < 0.3” over a 3° FOV on 
an 8m telescope (6.5m equivalent, considering the vignetting), which they called 
the “Dark Matter Telescope”. A corrector lens was necessary to eliminate chromatic 
aberration generated from indispensable transmissive optical components, e.g. a 
dewar window and a filter. The first lens and the last lens work as the dewar 
window and a fixed dedicated filter, respectively. 

This modified P—B telescope became the design starting point of the Large 
Synoptic Survey Telescope (LSST). As is shown in Fig. 6, the position of the third 
mirror is adjusted so that the primary and the third mirror are fabricated on a 
single substrate. The dewar window is now the third (and the last) element of the 
corrector and the filter is located out of the dewar, which enables its exchange. This 
design realized a flat focal surface over a 3.5 degree (64cm) FOV with a designed 
image quality of Dgp < 0.2” in the i-band. 


M2 3.4m £/1.00 ; Flat 3.5 deg. FOV 
0. i 23 


j Wy ih 
Me ous a T75deg_| [U 
6") th i ® 
M1 8.4m £/1.18 3 tL! 
Fig. 6. The optical design of LSST (left: telescope, right: camera). Figure credit: LSST Project / 
NSF/AURA. (See electronic edition for a color version of this figure.) 
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3. CCDs for Astronomy 


3.1. Large Format Devices 


Since the early 1980s, CCDs have been used for astronomical observations, replacing 
the conventional photographic plates. The advantage of CCDs includes high quan- 
tum efficiency (QE) and good linear response to the incident light. CCDs enabled 
a strikingly deep exploration of the universe.!? Their drawback was the relatively 
small size of the imaging area. The focal plane of a telescope is usually 10-50cm 
across and specially ordered photographic plates could cover the entire focal plane. 
Meanwhile, the sizes of early phase CCDs were <~1cm and thus constrained the 
FOV significantly. 

Synchronized with the rapid improvement of semiconductor technologies to 
manufacture CPUs and memories, the performance of CCDs improved dramati- 
cally in many aspects, such as the readout noise, the dark current level, charge 
transfer efficiency (CTE) and the cosmetics. Although CCDs became popular in 
commercial products such as camcoders, the direction of the developments were 
toward even smaller imaging area and finer pixel size to make the optics system of 
the products as compact as possible. This was completely opposite to the direction 
that astronomers wanted to pursue. 

Thus, there appeared astronomers who actually designed ideal CCDs for astron- 
omy by themselves.” Reference 14 discusses the design of 2048 x 2048 (2k2k) 
detectors with 15m pixels using the AutoCad software running on a PC clone 
and production of a 22 wafer foundry run at Ford Aerospace. This is a “Frame 
Transfer” (FT) device (also referred to as “electronic shutter”), where all of the 
pixel area is used both for the charge integration and the transfer, and the image is 
transferred in milliseconds to a storage region for readout during the next exposure. 
The pixel structure of an FT CCD is simpler than other types like “Interlace”, where 
the integration and transfer areas are separated by the transfer gate. Out of the 88 
devices, 26 (30%) were unshorted and showed excellent cosmetics with good CTE 
(>0.999997). Although the readout noise level was not shown, the results were quite 
encouraging for astronomers. They then developed a 2k2k “edge-abuttable” CCD, 
where bonding pads were placed only along two sides and the pixels were placed 
very close to the other two sides. By mosaicing the devices in 2 x 2 configuration, 
one could obtain an even larger focal plane (Fig. 7). The final variant of this family 
was 2k4k, three edge-abuttable CCDs!° with 15 um pixel that was manufactured 
by Loral.° This format became a de facto standard of CCDs for astronomy over the 
decades. 


>See also Chapter 1 of Volume 2 of this Handbook. 
Formerly Ford Aerospace. 
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Fig. 7. 2x 2 mosaic of 2048 x 2048 two-side edge-abuttable CCDs. Courtesy G. Luppino. 


3.2. Back-Illuminated CCD 


One of the biggest reasons to build large telescopes is to increase the photon collect- 
ing power. Considering the costs to build and operate large telescopes, the demands 
on the QE of CCD detector are high. On the conventional front-illuminated (FT) 
CCD, most of the incident photons, especially in the shorter wavelength (<450 nm), 
are absorbed or reflected at the polysilicon gate that covers the front side of the 
frame transfer CCD. A photographic plate, on the other hand, has sensitivity up 
to this short wavelength. In order to compare with the legacy data accumulated by 
plates, better sensitivity in the short wavelength was required for CCD detectors. 
To avoid the absorption one flips the detector and makes the photon incident from 
the backside. This is called a back-illuminated (BI) CCD. 

The standard process to build BI CCD is as follows. After the gate structures 
and implants are formed on the front side, the wafer is flipped and epoxied to a 
support substrate. In order to eliminate the bulk silicon where no electric field exists 
to collect charge, the backside of the wafer is mechanically lapped and chemically 
thinned until the epitaxial layer of the wafer is exposed. After the thinning, the 
backside is damaged and has charge traps. This is corrected by ion implantation 
followed by thermal or laser annealing. Then, an anti-reflection (AR) coating is 
formed by a single layer of SiO2g or two layers of HfO and MgF. Stewart Obser- 
vatory, University of Arizona, has a CCD detector laboratory where Lesser et al. 
pioneered the BI process for astronomical applications and has been long serving 


the community by providing a small number of high quality devices.!® 
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3.3. Deeply Depleted and Fully Depleted CCD 


As shown in the previous section, QE at shorter wavelengths is dramatically 
improved by the BI processing. The QE in the red, on the other hand, drops because 
the thin CCD membrane is transparent for longer wavelength photons. To improve 
it, a thicker depletion layer is required, where the lateral electric field exists for 
photo-charge collection. Reference 17 employed high resistivity p-type silicon and 
realized 40 um thickness on 2k4k 15 wm devices (CCID-20), which are significantly 
thicker than conventional devices with typical depletion layers of <20 um thickness. 
The CCID-20 device was jointly developed by an astronomer’s consortium led by 
Gerry Luppino, University of Hawaii, and was used in various instruments for large 
telescopes; e.g. Suprime-Cam on the Subaru Telescope (see Sec. 4.1), DEIMOS on 
the Keck telescope and CFHT12K Camera on CFHT. The device also featured high 
responsivity amplifiers (15 V/e7), which enabled low noise readout (2-3 e~ rms) 
for slow readout rates (50 kHz). 

Reference 18 first fabricated CCDs on even higher resistivity (>10kQ-cm) sil- 
icon (see also Ref. 19). The n-type silicon wafer was used because it is easier to 
obtain high resistivity n-type material. In this device the signal carriers are not 
conventional electrons but holes, whose mobility is worse than electrons. This does 
not matter so much because the large format device of this type is typically read out 
very slowly (~150kHz readout). They realized 300 um thickness, which improved 
the red response dramatically. This also suppressed the level of interference fringing 
due to multiple reflections inside the silicon membrane. 

Stimulated by these pioneering efforts, Hamamatsu Photonics developed CCDs 
of 2k4k format with 15m pixel for the Subaru telescope.?°?! In order to fully 
deplete the silicon across the 200 wm thickness, a bias voltage of 50 V is applied, 
which also minimized the charge diffusion (rms) down to half a pixel size. When 
one chooses the thickness of devices one must consider the focal ratio (f) of the 
optics. Faster cone beams, which have wider opening angles, tend to have more 
dilute images in thicker devices, especially at longer wavelengths. In the case of the 
Subaru prime focus, because the f/ratio of 2.0 is faster than that of other 4m class 
telescopes, a moderate thickness of 200 zm was chosen by balancing the image size 
and QE. 


3.4. Other Commercially Available CCDs 


SITe/Tektronix had heritage from a collaboration with NASA/JPL and has been 
providing large format CCDs since the middle of 1990s. Their “TK2048” had the 
format of 2048x2048 with 24m pixels and it had a relatively low responsivity 
amplifier of ~2 wV/e, which forced longer readout times (>one minute). However, 
the large format of 50mm square was perhaps the unique (but premier) option 
for wide field imaging. They later developed a 2k4k three-edge-abuttable device 
with 15m (ST-002A) pixels, which was adopted for the first phase mosaic of 
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Suprime-Cam, combined with MIT/LL’s CCID20 (four STO02A + two CCID-20). 
The performance of devices was moderate in every aspect, perhaps to avoid risks 
because they had firm commitments with clients for delivery on a commercial basis. 
The delivered devices have been contributing significantly to the advance of many 
fields of astronomy. One such example is the SDSS photometric camera (Sec. 4.1), 
which is supposed to be one of the most productive facilities ever. 

English Electric Valve (EEV), now called e2v, is another commercial vendor 
of CCDs for astronomy. They produced a 2k4k three-edge-abuttable device with a 
slightly smaller pixel of 13.5 zm in the late 1990s (CCD42-80). The amplifier has 
a responsivity of ~5 wV/e, which enables relatively low noise. It also features high 
QE at shorter wavelengths, thanks to the finer backside treatment and AR coating 
process. Since then, e2v has become the largest commercial vendors for astronomy 
and their product line covers up to 9232 x 9216 arrays with 10 um pixels. They 
have a variant fabricated on high resistivity silicon whose depletion thickness reaches 
40 um. Most of the characteristics are similar to those of the CCD42-80. 


4. Mosaic Imagers 


4.1. Examples 


Table 1 presents a list of visible imagers on large telescopes. All are mosaic CCD 
cameras. The upper three cameras built in early days employed CCDs with nonabut- 
table packages that produce relatively large gaps in the images. All the other cam- 
eras use edge-abuttable CCDs and the gaps are minimized to as little as 0.5mm in 
the case of MOCAM. Typical gaps between arrays are 1-5 mm (up to ~10mm for 
JPCam). 

The MCCD-1 camera is one of the pioneering mosaic CCD cameras, built origi- 
nally for the 1.05 m Kiso Observatory Schmidt telescope.?? They used commercially 
available and relatively small, low-cost CCDs (TI TC-215), avoiding the risk of 
procurement of large format CCDs. Each CCD was mounted approximately along 
the spherical focal surface of the Schmidt telescope by inserting precisely machined 
aluminum nitride (AIN) spacers. The deviation from the fiducial focal surface was 
40 wm, which is smaller than the focal depth of 60 jzm. In order to handle the signal 
from “many” CCDs, they installed analog multiplexer boards inside the dewars to 
reduce the driving and signal lines through the hermetic connectors. MCCD-1 had 
the world’s widest focal-plane area at that time. The second version of the camera, 
MCCD-2, was built with 40 CCDs and was used at WHT with a reasonably wide 
field of view of 0.12 square degrees.?? 

BTC used four TK2048B detectors, which were then the largest commercially 
available BI CCDs.?4 It was built for the CTIO 4m Blanco telescope and featured 
the “biggest throughput” at the time. Compared with single RCA 312 x 512 (30 um 
pixel) CCDs that were extensively used by Tony Tyson and others to observe the 


Camera 
Name 


MCCD1 

MCCD2 

BTC 

SDSS 

MOCAM 

UH8K 

NOAO 
Mosaic-1 

CFH12K 

Suprime-Cam 

WFI 

MegaCam 

Pan-STARRS 

OmegaCAM 

URAT 

ODI 

DECam 

Hyper 


Suprime-Cam 


PAUCam 
NOAO 
Mosaic-3 
JPCam 
LSST 


Telescope 
Name 


Kiso (Schmidt) 
WHT 

Blanco 

SDSS 

CFHT 

CFHT 

Mayall 


CFHT 
Subaru 
MPG/ESO 
CFHT 
Pan-STARRS 
VST 

URAT 

WIYN 

Blanco 
Subaru 


WHT 
Mayall 


OAJ T250 
LSST(3MT) 


tf is the telescope focal ratio. 


=FI: Front-side illuminated, BI: Backside illuminated, DD: Deeply depleted, FD: Fully depleted, OT: Orthogonal Transfer. 


D |] 


1.05 
4.2 
4.0 
2.5 
3.6 
3.6 
3.8 


3.6 
8.2 
2.2 
3.6 
1.8 
2.6 
0.23 
3.5 
4.0 
8.2 


4.2 
3.8 


2.5 
8.4 


A [m?] 


0.9 
13.8 
10.0 

3.8 

9.6 

9.6 
10.0 


9.6 
51.7 
3.2 
9.6 
2.5 
4.6 
0.04 
8.5 
10.0 
51.7 


13.8 
10.0 

3.9 
37.4 


Table 1. 
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3.5 
12 


CCD 
Vendor 


TI 
TI 
SITe 
SITe 
Loral 
Loral 
SITe 


MIT/LL 
MIT/LL 
e2v 

e2v 
MIT/LL 
e2v 
STA-UA 
MIT/LL 

LBNL 

Hamamatsu 


Hamamatsu 
LBNL 


e2v 
e2v/STA-UA 


NNNNYNH EH 


Nw eENBNHNNNY WD 


iw) 


© 


Format 
(pixel size) 


k1 
k1 
k2 
k2 
k2 
k4 
k4 


k4 
k4 
k4 


k4 
k4 


k4 
k4 
k4 


k4 
k4 


k9 
k4 


k4. 


5k(13.5) 
(10,12) 
k(15) 


0k10k(9) 


k(12) 
k(15) 
k(15) 


k(15) 
k(15) 


k(10) 


k(10) 


List of visible mosaic imagers. 


Type? 


FI 
FI 
BI 
BI 
FI 
FI 
BI 


BI-DD 
BI-DD 
BI 
BI 
BI-OT 
BI 
BI 
BI-OT 
BI-FD 
BI-FD 


BI-FD 
BI-FD 


BI-DD 
BI-DD 


Necp 


16 


189 


FOV 
Q [deg?] 


0.72 
0.12 
0.24 
6.0 
0.07 
0.25 
0.36 


0.375 
0.256 
0.31 
1 
7(x4) 
1.0 
28 
0.47 
3.0 
1.77 


1.0 
0.36 


4.7 
9.3 


AQ 


0.65 
1.7 
2.4 
23.0 
0.67 
2.40 
3.59 


3.60 
13.17 
1.0 
9.59 


15.0(x4) 


4.6 

1.2 

4.0 
30.0 
91.3 


13.8 
3.59 


18.3 
347.8 


In 
Operation Ref. 
1991 22 
1996 23 
1996 24 
1998 10 
1994 25 
1995 15 
1998 
1999 26 
1999 27 
2002 28 
2002 29 
2006 30 
2011 31 
2011 32 
2012 33 
2012 34 
2012 
2015 35 
2015 
—= 36 
= 37 
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3 more than 50 times increase of the FOV was 


deep universe ten years before,! 
realized. 

The SDSS photometric camera employed 30 TK2048Bs, providing a field of view 
of up to 6 square degrees mounted on the modified R-C SDSS telescope (Sec. 2.2). 
Five different filters (u, g, r, i, z) are fixed just in front of CCDs and TDI observations 
are made across six CCDs with the same filter. The device has notorious convex 
curvature (~200 um) toward incoming light and they coped with it by cementing a 
heavy Kovar stiffener on the back.'° 

The Canada—Hawaii-France Telescope has a wide field prime focus, for which 
a series of mosaic imagers have been built. The MOCAM?® and UH8K?° cameras 
employed Loral edge-abuttable CCDs designed by John Geary. The mosaic config- 
uration of UH8K is 4 x 2 three-edge-abuttable 2k4k CCDs, which became a model 
for later mosaic imagers. CFH12K is a successor of UH8K that has a 5x2 array 
of MIT/LL CCID-20 CCDs.”° It was followed by MegaCam,?? with 40 e2v 2k4.5k 
CCD, which realized a 1 square degree FOV. 

ESO built the “Wide-Field Imager” (WFT) for the MPG/ESO 2.2m telescope 
with a 4 x 2 mosaic of e2v’s 2k4k CCDs.” An even larger mosaic of the same CCDs 
is the OmegaCam, which has an 8 x 4 mosaic for the 2.6m VLT survey telescope.*! 

The Subaru telescope uniquely features a wide field prime focus on a large 
telescope (D >8m). Suprime-Cam?’ was built using ten CCID-20s. Coupled with 
the compact corrector with lateral shift ADC (Sec. 2.2), the camera achieved seeing- 
limited imaging at Mauna Kea over a 0.5 degree FOV. Thanks to the high QE at 
longer wavelengths given by the high resistivity version of the CCID-20, Suprime- 
Cam has been good at hunting high redshift galaxies in the 2000s by using very 
narrowband filters tuned to the wavelength where OH emissions from the sky are 
low. Hyper Suprime-Cam (HSC) is a successor of Suprime-Cam which also features 
seeing-limited imaging over a 1.5 deg FOV. The focal plane of HSC is shown in Fig. 8. 
HSC adopts Hamamatsu’s fully depleted (FD) CCDs (2k4k four-edge-abuttable), 
whose thickness (200 wm) doubles the QE at 1 micron compared with CCID-20. 
PAUCam also adopts Hamamatsu’s fully depleted CCDs.*° 

LBNL’s FD CCDs (2k4k four-edge-abuttable) formed the Dark Energy Camera 
(DECam) that is mounted on the 4 m Blanco telescope and realizes a very wide 
FOV of 3 square degrees.*4 The northern Mayall telescope (4m) originally had a 
mosaic imager using a 4x 2 array of SITe’s 2k4k three-edge-abuttable CCDs (NOAO 
Mosaic-1) and it is now upgraded to 500 wm thick LBNL FD CCDs (Mosaic-3). The 
Pan-STARRS camera employs unique Orthogonal Transfer Arrays (OTA), which 
allow charge transfer from the imaging area in the serial direction in addition to 
the parallel direction.?* This freedom can be used for the shift-and-add integration 
to cancel the lower order image motion due to atmosphere and turbulent telescope 
drive. Sixty OTAs of 4k4k (four-edge-abuttable) cover 7 square degrees and provide 
the world’s largest pixel-count camera in late 2018. The OTA is also used in the 
One Degree Imager (ODI) for the 3.5m WIYN.%% 
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Fig. 8. Focal plane of Subaru Hyper Suprime-Cam paved by 116 Hamamatsu’s fully depleted 
CCDs. 


4.2. CCD Alignment Techniques 


Co-planarity of CCDs is crucial, especially when the instrument f-ratio is small 
(fast). Because the flatness between the light incident surface and the bottom sur- 
face of the package is not always guaranteed, mosaic builders have to deal with 
it. Typically, an alignment block is inserted between the CCD and the flat cold 
surfaces. Reference 39 preferred to machine the block after they fit the components 
together and measured the errors. 

Because the prime focus of Subaru has the fastest focal ratio, they had to cope 
with the tightest tolerance.4° They inserted metal foils of appropriate thickness 
(5-300 pm) at the four corners of the block. The thickness of the foils are determined 
based on the height measurements by a laser displacement meter. After the selection 
of foils, they lifted the CCD package slightly and inserted epoxy with a syringe to 
fill the gap between the block and the CCD package made by the spacers at the 
four corners. Appropriate force was applied by hooking the CCD down at the groove 
along the side of the package. As a result, they realized a co-planarity error smaller 
than 30 ym, which is roughly half of the focal depth. Even tighter co-planarity is 
required on the LSST camera because of the smaller pixel size and faster f-ratio of 
1.2. They plan to adopt “a differential screw assembly” to allow jum level adjustment 
by 5-degree rotation. 

Tolerance on the CCD displacement perpendicular to the optics axis is com- 
monly looser than the co-planarity in imaging applications because pixel resampling 
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is required in the later data reduction stage in order to deal with the optical distor- 
tion. When a simple alignment method using pins and holes is adopted, an alignment 
error of 50-70 ym is realized, which is enough for most applications as long as it 
is fixed. The SDSS photometric camera, however, had tighter tolerances because 
of the TDI operation. They conceived a ball-and-cone socket design that allows 
the adjustments of the tilt and the rotation along the optical axis.1° They suc- 
ceeded in locating all the chips within about 1 pixel (24m) from the designed 
position. 


4.3. Cryogenic Dewars 


CCDs must be cooled down to —100°C to minimize the dark current and thus 
must be housed in a vacuum cryogenic dewar. Two types of cooling systems are 
commonly used. One is a conventional system using liquid nitrogen (LN2), which 
allows reliable and relatively inexpensive operation. It is still adopted in modern 
visible imagers.°4 3° 

The other system is a mechanical cooler such as a Stirling cycle cooler or a 
pulse tube cooler. Although these reduce the daily operation work significantly, the 
required maintenance cycle used to be very short in the past and it was hard to 
adopt the system in the long run. But recently, the maintenance cycle has become 
quite long. Suprime-Cam (5W at 80K) and HSC (two 8 W coolers at 80K) adopt 
pulse tube coolers built by Fuji Electric (Japan), whose maintenance cycle is longer 
than 50,000 hours. The temperature of the coldest part inside the dewar (usually 
the cooler head) is not so different from the CCD operation temperature, which is 
not sufficiently low to trap the out-gas. Therefore, one has to employ an ion pump 
to keep the vacuum level low. The life time of the ion pump is long enough if the 
vacuum level is kept as low as ~10~® Torr. 


4.4. Possible Future Direction of Visible Imagers 


Visible imagers reviewed in this chapter all employ CCDs. CCD has been an ideal 
detector in terms of the readout noise and the quantum efficiency over the wide 
wavelength range. It takes, however, a relatively long time to read them out because 
signal charges have to be transferred to the output port for readout. Recent remark- 
able advances of CMOS sensors makes it possible to read them out fast (faster than 
1 frame per second) while keeping the low nose (few electrons). This is realized 
by implementing thousands of analog-to-digital converters on-chip (column A/D 
converter), which keeps the pixel rate per converter reasonably low. 

The downside of commercially available CMOS is its low sensitivity, especially in 
longer wavelength, due to the thin depletion layer (typically a few wm). Some special 
structure is necessary to realize thicker CMOS such as implementing additional 
implants under the p-well to minimize leak current when the high back-bias voltage 
is applied.4! 
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In this chapter, we will summarize the development of wide-field imaging as 
applied to observations at near-infrared wavelengths (0.9-2.5 um). In this regime, 
the background contamination from thermal emission of the instrument and tele- 
scope structures must be carefully controlled. These controls have, in turn, conse- 
quences for design details which are not normally encountered in instrumentation 
for the visible range. We will cover some of these developments from a relatively 
simple imager system through to an entire telescope designed from the ground 
up as a wide-field infrared imaging system. 


1. Introduction 


While there are many similarities between imaging in the visible and imaging in 
the near-infrared, the latter field is considerably more challenging (and expensive) 
due to a combination of factors which arise at all stages between the top of the 
atmosphere and the data archives. In this chapter, we will discuss these effects, and 
their consequent design implications, and describe a number of instruments that 
have successfully managed to address these challenges. 


2. Background Sources 


The fundamental difference between observations in the visible and the near-infrared 
is the nature and extent of the various background contributions, and instrument 
design is driven by the need to account and correct for these. There are three 
main sources of background contributions, each requiring different considerations 
in instrument design. 
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2.1. The Infrared Sky 


There are three main differences between the night sky in the near infrared and the 
visible. First, observations between 1 jzm and 2.5 jzm enter the thermal regime where 
the sky itself is emitting as a blackbody (with an effective temperature typically 
around 230K) and is subject to temperature changes. Second, the sky shortward 
of about 2 microns is dominated by a forest of bright emission lines arising from 
OH radicals in the atmosphere. These lines are known to group into a number of 
distinct families, each of which varies independently, both temporally and spatially, 
on scales of a few tens of seconds or arcseconds, respectively. Third, the absorption 
bands due to atmospheric water vapor are much stronger at these wavelengths than 
in the visible, to the point where the band at 1.375 wm saturates, giving rise to much 
stronger variations in atmospheric transmission. Figures 1 and 2 show the synthetic 
night sky spectrum (in emission and absorption, respectively) for the European 
Southern Observatory site at Cerro Paranal.!:? 


2.2. Local Background 


In addition to thermal emission from the sky, infrared imagers must also deal with 
local thermal emission from the structure of the telescope and enclosure, as well 
as from the instrument itself. This can be minimized by careful design of the tele- 
scope structure and choice of low-emissivity materials, but must be addressed in 
instrument design with a combination of cooling all structures with sight-lines to 
the detector, and by appropriate masking within the instrument. 
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Fig. 1. Example radiance sky spectrum for Cerro Paranal from the Skycalc simulator.) 2 
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Fig. 2. Example atmospheric absorption spectrum for Cerro Paranal from the Skycalc simula- 


2.3. Detector Background and Noise 


At wavelengths longer than 1.05 um, the Silicon band-gap is reached and Silicon 
detectors are no longer useful. Near-IR detectors have been developed using more 
exotic materials, with HgCdTe now favored in the 1—2.5 zm range, and InSb or SiAs 
now favored at longer wavelengths. 

In the context of imaging systems, it must be noted that these detectors are 
photo-diode based, unlike CCDs, and so each pixel must be addressed individually. 
This has the benefit that the detector can be read out and reset very quickly, typi- 
cally on a timescale of 1 s, but the drawback is that there are distinct components in 
the readout channel for each pixel which can give rise to additional underlying image 
structure. At the wavelengths of interest, the detector temperature is also critical, 
since the devices will radiate in their own detection window, and such emission 
is detected as a dark current in the detector which must be corrected for. For a 
detector sensitive to 2.5 um this implies that the focal plane itself must be cooled 
to around 70 K, although for a device which is only sensitive to 1.7 um this can be 
relaxed to around 140K without significant additional dark current. 


3. Optical Systems 


3.1. General Considerations for Design of Cryogenic Optical 
Systems 


As implied in Sec. 2.2, working in the thermal part of the spectrum implies that 
all components which could generate thermal emission in the passband of the 
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instrument must be cooled, or their paths to the detector blocked with cold baffles. 
This has the following implications: 


(1) Warm vs. cold dimensions: The optical design of the system must reflect the 
physical dimensions of the structure and lens assemblies at operating tempera- 
ture. These dimensions must be translated to room temperature for manufac- 
ture. For large instruments care must be taken to ensure that these changes 
are not so large as to generate interference between structures during warm 
assembly. 

(2) Warm vs. cold alignment: It may well be desirable to constrain the optical sys- 
tem to provide for an optical alignment test to be carried out prior to cooldown, 
at least for verification of gross performance. 

(3) Warm vs. cold optical properties: Designs must take account of expected vari- 
ation of the refractive indices of the components between manufacturing and 
operational temperatures. Interference filters may also exhibit small shifts in 
bandpass as a function of temperature, as the fractional variation in film thick- 
ness or cavity width can be significant. 

(4) Compliance of lens mounts: The difference in thermal expansion coefficients 
(CTEs) between the lens and the lens mount must be considered, either by 
careful choice of materials to match CTEs between materials, or by incorpora- 
tion of compliant materials (e.g. RT'V) or spring contacts. In the first approach, 
consideration of the change in CTE as a function of temperature should be 
considered. Any adhesives used to bond glass to metal should be qualified for 
use, as the stresses induced by the glass-transition phase change in the adhesive 
may cause damage to the glass. 

(5) For an NIR system there is one further consideration, since typical interferom- 
eters used for testing system alignments operate at visible wavelengths, which 
may have implications for both the wavefront performance and the throughput 
of the system for the purposes of optical testing. 


3.2. Classical NIR Camera with Cold-Stop 


As an introduction to designs, we consider a single-detector, modest-field broad- 
band camera located at the f/11 Nasmyth focus of a 4 m-class telescope (in this case 
a representation of INGRID at the William Herschel Telescope**). The nominal 
layout for such a camera is shown in Fig. 3. The key features of the design are that 
the camera optics are cooled to minimize thermal emission, and that there is an 
intermediate image of the telescope pupil formed inside the camera where a cold 
mask can be placed that matches the primary mirror and effectively blocks all paths 
from sources of thermal emission (such as the dome and telescope structure) from 
reaching the detector (in this case a 1024 x 1024 pixel HgCdTe array). 

In addition to two filter wheel mechanisms and a third wheel for pupil masks, 
all located between the window and the first lens group, a further mechanism is 
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Fig. 3. A representation of the INGRID camera at the Nasmyth focus of the 4.2m William 
Herschel Telescope (see text). For scale, the diameter of the field lens here is 120 mm. 


included that allows a pair of reimaging lenses to be introduced between the two 
main lens groups of the camera. This allows direct imaging of the pupil mask by 
the detector, and hence provides feedback to allow the mask position and focus to 
be adjusted to ensure optimal blocking of the edges of the primary. 

In this configuration, it is also worth noting that the acquisition and guiding 
functionality must be provided by the traditional auxiliary systems that naturally 
form part of the telescope infrastructure, and not part of the instrument. 

Such a design provides excellent performance in terms of image quality and 
control of the thermal background, but has some limitations. For an alt-az telescope, 
the pupil-stop should, ideally, be able to track the rotation of the secondary mirror 
support structure and mask the spider-vanes. It is also clear that while this works 
well for a modest field in a relatively slow beam, it is not likely to scale well to a 
system that would make use of a substantially larger focal plane. 

This concept was employed extremely successfully in the development of the 
2 Micron All-Sky Survey,® which employed a three-camera implementation of a 
similar design with two dichroics, a common field lens inside the cryostat, and a 
separate pupil stop in each channel. In this case, the design is useful for wide-field 
imaging because the telescope is small (1.3m primary), but dedicated, and the 
projected pixel size is large (2’). This configuration enabled a relatively shallow 
3-band all-sky survey from two telescopes (one in each hemisphere) in four years. 


3.3. Schmidt-Camera Adaption of Existing Telescope — WFCAM 


Now we consider the steps that were taken to introduce a wide-field imaging system 
to the 3.8m United Kingdom Infrared Telescope (UKIRT). Built in the mid-1980s, 
UKIRT was designed for IR observations to make the best use of the Mauna Kea 
skies, and adopted an equatorial mount with an f/2.5 primary and a small secondary 
mirror at f/35. In principle, the f/2.5 primary would suggest that a prime-focus 
solution might prove feasible for wide-field imaging, but the lightweight telescope 
structure, designed for low thermal inertia, and the compact dome effectively rule 
out such an approach. A true Cassegrain instrument would also be somewhat com- 
promised if existing capabilities were to be retained. 
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Fig. 4. Overall schematic layout of WFCAM on the 3.8m UK InfraRed Telescope. 


Instead, a more innovative route was adopted,® with the introduction of a new 
f/9 secondary mirror to produce a forward-Cassegrain focus around 6m in front 
of the primary mirror. A single lens placed at this focus then images the primary 
mirror onto a cold aperture stop inside the camera cryostat. A 0.8 m tertiary mir- 
ror, located inside the cryostat, forms an f/2.4 focus at the location of the central 
obstruction in the cold stop (Fig. 4). The camera entrance window sits roughly mid- 
way between the reimaging lens and M1. Aberrations arise in this design from the 
change of the secondary mirror from f/35 to f/9, and these are corrected by means 
of an aspheric plate (sixth order, even) mounted just inside the cryostat. Placing 
the corrector plate close to the cryostat window has the additional benefit that it 
acts as a partial screen between the cold internals of the cryostat and the window, 
thus reducing the risk of dewing on the front window surface. 

The focal plane delivered by this design is 0.9° diameter on the sky, and this 
is populated by 4 Rockwell* HAWAII-II 2048 x 2048 pixel HgCdTe detectors with 
18 um pixels. The arrays are mounted in a sparse grid with a spacing set to 94% of 
the detector size. This spacing was required to allow adequate space for the wiring 
around the detector mounts. A full image of the sky is therefore achieved with a 
4-point dither pattern, which provides a suitable overlap between each array in each 
dither to allow cross-calibration. Each bandpass filter is also then made of a mosaic 
of four elements, held in a common frame. 

In this concept, there is no space available for a filter wheel that could accom- 
modate a suitable number of large (130mm) filter panes, and so instead these are 
mounted in separate mechanisms (eight in total) which are placed into position in 
front of the field flattener from a storage position, such that the stored filter lies 
parallel to the optical axis in the shadow of the central obstruction (Fig. 5). 


“Now Teledyne Imaging Sensors. 


Wide-Field Near Infrared Imaging 181 


Focal Plane Tertiary Mirror 
Filter 
Filter Stow position 


Cold Stop 
Field Flattener 


500 mm 


Fig. 5. Details of the internal layout of the WFCAM cryostat. 


There is no provision for an auxiliary acquisition and guiding unit at this loca- 
tion. Instead, this functionality is achieved by means of a fixed CCD mounted in 
the center of the focal plane to give a 5 x 5 arcminute field of view. 


3.4. New Concept Wide Field NIR Telescope — VISTA 
and Beyond 


To overcome the complexities of retro-fitting wide-field systems to an existing tele- 
scope, one must revisit the telescope design. VISTA’ (Fig. 6) is specifically designed 
for wide-field NIR imaging. The initial choice for a wide field is clearly between an 
instrument at prime focus or Cassegrain focus, and this choice is heavily constrained 
by the pixel size of practically available detectors (typically ~ 20 um) and the match 
between this and the site seeing limit. For a 4 m-class telescope, this effectively 
means that an f/2.5 primary mirror is the practical limit for developing a system at 
prime focus, whereas the additional freedom provided by a secondary mirror gives a 
wider range of options for Cassegrain designs. Clearly the issue of instrument mass 
is critical if one is to consider placing a substantial cryogenic camera at the primary 
focus, but the adoption of a Cassegrain system with a fast primary provides for a 
substantial reduction in the overall size of the enclosure with benefits for both the 
overall cost and the impact of the enclosure on the site seeing characteristics. 

A design was therefore adopted for VISTA with a very fast (f/1.0) 4.1m pri- 
mary mirror with a convex hyperbolic 1.24 m secondary in a quasi-Ritchey—Chrétien 
design. The Cassegrain focus is located 1.2m below the pole of M1. Although 
designed to be a general-purpose wide-field telescope with both infrared and visible 
applications in mind, the telescope design was optimized with the infrared camera 
design as a single system. As an interesting comparison, it is worth noting here 
that the moving mass of the telescope for the f/1.0 design is around 90 metric tons, 
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Fig. 6. Overall layout of the VISTA telescope and camera. 
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Fig. 7. Detail of the VISTA cryostat internal optical configuration. All lenses and the window 
are Infrasil. The window is 850 mm diameter and 72mm thick. 


compared to an estimated 250 metric tons required to support an f/2.5 system with 
a large prime focus instrument. 

The cryostat optical layout is shown in Fig. 7. The three lenses correct for 
the aberrations introduced by the telescope configuration, particularly astigmatism. 
The full field of view provided by this design is 1.57° and it delivers 0.6” (FWHM) 
images over the full field in the best seeing conditions experienced at Paranal. The 
optical design allows for a small chromatic focus term, which is accommodated by 
adjusting the thickness of the filters to compensate. 

The VISTA IR focal plane is populated with 16 2048 x 2048 VIRGO HgCdTe 
detectors with 20 ym pixels. As in WFCAM, these are placed in an open grid pattern 
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to allow for mounting and cabling of the detectors, and to allow the filter to be 
made from detector-sized sub-elements. At this scale of focal plane, the flatness of 
the individual detectors becomes a significant issue for the design: for WFCAM the 
constraint was to maintain all pixels within 50 wm peak-to-valley for four detectors 
over a focal plane of 240 mm diameter. In the VISTA case this becomes 25 zm peak- 
to-valley over a 400mm diameter focal plane, where the worst individual detector 
was 244m peak-to-valley. The detector mounting plate was machined from solid 
Molybdenum (for stiffness), and the mounting and referencing of the individual 
detector pedestals was achieved in close partnership with the detector manufacturer. 
Unlike WFCAM, the VISTA focal plane is rectangular, with a 90%/42.5% detector 
spacing. This requires a six-point dither pattern to survey a filled image, in which 
each pixel on the sky is observed twice. 

The absence of a cold stop in this design is extremely beneficial for the overall 
size and mass of the instrument, but has a number of consequences: Background 
considerations mean that in operation the camera is effectively limited to the short 
K-band (K,). Control of out-of-band stray light and excess thermal emission was 
achieved by manufacturing a set of seven nested baffles around the optical beam 
between the window and L1 (Fig. 8). These baffles are elliptical surfaces, machined 
from Aluminum, such that one focus of each ellipse falls on the center of the cryostat 
window. The baffles were coated with a dielectric layer that effectively absorbs below 
3 um and reflects from 3-10 wm. The reflected thermal radiation is largely absorbed 
by the window, raising the central temperature by around 1.5°C, which assists in 
minimizing condensation. 

It is also important to take care of light that may reach the focal plane past 
the secondary mirror: M2 is effectively undersized in VISTA, so that each pixel sees 
a 3.7 m-diameter footprint on the primary. The area around the secondary must 
therefore be baffled to ensure that all paths leaving the focal plane that would miss 


Fig. 8. Left: A view into the VISTA camera cryostat during integration reveals the complex 
nesting of stray-light baffles. Right: A view past the secondary mirror showing the reflected image 
of the nested baffle arrangement around M2. 


184 G. Dalton 


M2 are reflected to the sky via M1. A 2-stage baffle was therefore added around M2 
(Fig. 8). 

The camera cryostat is nearly 3m in length, has a mass of 2.9 metric tons, 
and mounts through the primary mirror to a point where the window surface is 
almost as close to M2 as to M1. Cooling is maintained by a set of three closed-cycle 
helium coolers, but the initial cool-down is accelerated by a liquid nitrogen flow 
system, requiring around 700 liters of LN2 over 2 days. The cryostat is maintained 
at operational temperature for at least a year between stand-downs. 


3.4.1. Guiding and Wavefront Sensing 


Like WFCAM, VISTA has no provision for a classical auxiliary guiding system. Fur- 
thermore, the f/1 primary mirror configuration implies a tolerance on the position of 
M2 of around 2 wm. Active control of M2 was implemented with a hexapod system 
capable of controlling focus, tilt and translation. Feedback to drive the position of 
M2 is provided by a pair of curvature sensors mounted at either side of the focal 
plane. Each curvature sensor is a pair of CCDs arranged with a beam-splitter to 
acquire a pair of images of a selected star roughly 1.5mm ahead and behind the true 
focus position. Each curvature sensor is packaged with a third CCD operating in 
frame-transfer mode to provide a guiding signal for the telescope (only one guider is 
in use at any time, but both are required to provide sufficient field of view to be able 
to select a star). In operation, these sensors provide an update to the M2 position 
model roughly once per minute, but the model itself is built from an extensive set 
of calibration sequences generated from these sensors during commissioning.® The 
curvature sensor packages are mounted ahead of the 1.5m filter wheel, and must 
respect the same flatness constraint on the focal plane as the IR detectors. 


3.4.2. Other Considerations 


In pushing the frontiers of wide-field infrared imaging to the scale of VISTA, a 
number of other issues became apparent, which are not strictly limited to infrared 
observations, but which are worthy of note here: 


(1) Field Distortion: Although the image quality achieved by the VISTA design is 
excellent, there is a nontrivial amount of field distortion. At the edges of the 
field this implies that the amount of sky subtended by a pixel is a function of 
radius, and that the dithered images cannot be directly re-registered to give a 
full image without significant resampling. 

(2) Ensemble detector properties: For an ensemble of 16 arrays, the optimal choice 
of 16 arrays from a batch is not clear. It may be necessary to reject the best 
arrays in a batch to ensure that a common useful dynamic range is available 
for all detectors. 

(3) Laboratory vs. on-sky testing: The joint optimization of the telescope and cam- 
era designs yields very good performance, but the camera optics are correcting 
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up to 7000 nm of astigmatism from the telescope. In practice this means that 
the camera performance cannot be fully verified in the laboratory. (As an aside, 
this also means in practice that the telescope cannot be correctly aligned in the 
absence of the camera!) 

(4) VISTA breaks the classical model of a separate telescope and instrument. 
Instead, the wavefront sensors are key systems associated with the telescope 
performance and operation which must exist physically within the instrument. 
This gives rise to complexity in the systems engineering, but has the result that 
the telescope performance is tied directly to the science focal plane, which is 
considerably more effective. 


References 


1. S. Noll, W. Kausch, M. Barden et al., Astron. Astrophys. 543, A91 (2012). 

2. A. Jones, S. Noll, W. Kausch et al., Astron. Astrophys. 560, A91 (2013). 

. C. Packham, K. L. Thompson, A. Zurita et al., Mon. Not. Roy. Astron. Soc. 345, 395, 
(2003). 

. S. Rees, P. Jolley, M. van der Hoeven et al., Proc. SPIE 5492, 1665 (2004). 

. M. F. Skrutskie, R. M. Cutri, R. Stiening et al., Astron. J. 131, 1163 (2006). 

. M. Casali, A. Adamson, C. Alves de Oliveira et al., Astron. Astrophys. 467, 777 (2007). 

. W. Sutherland, J. Emerson, G. Dalton et al., Astron. Astrophys. 575, 25 (2015). 

. G. Dalton, W. J. Sutherland, J. P. Emerson et al., Proc. SPIE 7735, 73351J (2010). 


w 


CoN OD Oe 


This page intentionally left blank 


Part 3 


Spectrographs 


This page intentionally left blank 


Chapter 11 


Low- and Medium-Resolution 
Spectrographs for Astronomy 


Andrew Sheinis 


Director of Engineering, Canada France Hawaii Telescope 
65-1238 Mamalahoa Highway, Kamuela, HI 96743, USA 


sheinis @cfht. hawaii. edu 


What sets the resolving power of a spectrograph? What sets the sensitivity or 
speed of a spectrograph. What are the different components and the trade-offs 
for different types of components? This chapter aims to answer these questions 
in a fundamental way that is useful to both the student and the beginning spec- 
trograph designer. 


1. Introduction 


This chapter covers low- and medium-resolution (LMR) spectrographs for astron- 
omy. We will look at a number of topics associated with the makeup and perfor- 
mance limitations of LMR spectrographs for astronomy, including a description of 
the major components of a spectrograph and the most popular design families of 
those components as well as the fundamental physical constraints effecting resolving 
power and throughput. 


2. Spectrograph Components 


Most spectrographs contain a number of common elements. In some cases, multiple 
functions are combined into single elements and it is possible to build a one-element 
spectrograph.! However, for most astronomical applications, these components have 
historically been built as separate elements in order to allow them to be better 
optimized for the intended purpose. 

Figure 1 shows a schematic layout of a generalized spectrograph. The tele- 
scope is represented by a single equivalent lens, which forms an image of the sky 
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Fig. 1. A schematic layout of a generalized spectrograph: The telescope is represented by a single 
equivalent lens, which forms an image of the sky in the slit plane; the collimator, also shown 
by a single equivalent lens, projects the image of the sky to infinity, and reimages the entrance 
pupil of the telescope; the grating then imparts a change in angle as a function of the change in 
wavelength; finally the camera, also shown as a single equivalent lens, takes all the rays that have 
been parallelized by the collimator and then dispersed by the dispersing element and focuses them 
onto the detector. 


in the slit plane. The collimator, also shown by a single equivalent lens, projects 
the image of the sky to infinity, ie. it takes the rays from every field point and 
parallelizes them. In addition, the collimator reimages the entrance pupil of the 
telescope, generally onto the dispersing element. The next optical element in the 
light path is the dispersing element, which is shown as a box labeled “grating” 
in the figure. The dispersing element is characterized by its dispersion, which is 
simply the change in angle as a function of the change in wavelength, da/dX for 
example in radians per Angstrom. The next optical element in the optical path is the 
camera, which takes all the rays that have been parallelized by the collimator and 
then dispersed by the dispersing element and focuses them onto the detector. The 
last element in the optical path is the detector, which is typically a two-dimensional 
array: either a CCD, EMCCD, or an CMOS for the visible, or a mercury cad- 
mium telluride (HgCdTe) or indium gallium arsenide (InGaAs) array for the Near 
InfraRed (NIR). 

Another way to think about the function of a spectrograph is that the collima- 
tor, combined with the camera, forms an image of the slit plane onto the detector of 
whatever is in the slit plane, be it a slit, fiber, the output of an image slicer and/or 
the image of the nighttime sky. These monochromatic images are shifted later- 
ally on the detector plane due to the angular deviation imparted by the dispersing 
element. The end result is the spectrum of one or more spatial points in the slit plane. 
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2.1. Collimators 


Following the path of the light from the stars through the atmosphere, through the 
telescope, and past the slit, the first spectrograph optical element encountered is the 
collimator. Generally speaking, collimators can be categorized as either reflective, 
refractive, or catadioptric. 

While this is not a course on optical design, it is useful to understand the con- 
straints and requirements facing the optical designer when designing a collimator or 
camera for a spectrograph. Classical aberration theory shows the blur spot diameter, 
G, of the image of a star as a function of the field angle, ¢, and the normalized radial 
coordinates within the exit pupil, p and w, is 


8 = Sp* + Cop? cos(W) + Ag? p? cos*() + Fe’ p? + Dd* pcos(), (1) 


where S is the spherical aberration coefficient, C’ is the coma aberration coefficient, 
A is the astigmatism aberration coefficient, F is field curvature aberration coeffi- 
cient, D is the distortion aberration coefficient, and the angles 3, ¢, p and w are 
all in radians. As one can see from Eq. (1), the power of the field angle dependence 
is 0 for spherical aberration, and goes up in increasing order of magnitude for the 
other aberrations. The goal of the designer is to compensate for these aberrations. 

Reflective optical elements have the advantage that they are achromatic, and 
well-corrected collimators can be designed using multiple mirrors. For small field- 
of-view applications, the most straightforward design is a single reflective parabola, 
which is corrected for spherical aberration only and hence is perfectly corrected on 
axis. It is useful however, only over a small field of view, as the off-axis image quality 
of a parabola is dominated by coma. The blur spot diameter, 3 for a parabola is 
given by 


eee eee 
B= 16(F/DY. radians, (2) 


where y is the field angle in radians and F/D is the f-number of the optic of focal 
length F and diameter D. In practice, the usable field of view of an f/8 parabola 
is only of order 30 arcseconds as seen from vertex of the parabola, so this type of 
design is suitable for a single-field-point or single-fiber spectrograph. Figure 2 shows 
the optical layout for the Echellette Spectrograph and Imager (ESI) at the Keck 
telescope,” which contains an on-axis (relative to the collimator) parabolic mirror 
as a collimator used to collimate an off-axis field of view (relative to the telescope). 

Another possible single mirror collimator design uses a spherical mirror as a 
collimator and corrects the spherical aberration further down in the optical system, 
such as in the camera. An example of this is the VIRUS spectrograph for the Hobby- 
Eberly Telescope Dark Energy eXperiment (HETDEX),® shown in Fig. 3. This is an 
example of using a Schmidt camera as a collimator. In general, this type of design 
is used in the catadioptric collimator, as shown below. 
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Fig. 2. The Echellette Spectrograph and Imager, at Keck II, showing a parabolic collimator. This 
is an example of a single reflective optical element used as a collimator. 
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Fig. 3. The VIRUS spectrographs for the Hobby-Eberly Telescope Dark Energy eXperiment 
(HETDEX) use a spherical mirror as a collimator and correct the spherical aberration further 
down in the optical system, in this case in the camera. This is an example of Schmidt camera 
designs for the collimator and the camera. See electronic edition for a color version of this figure. 


In order to correct for a larger field of view with the on-axis system, multiple 
element refractive collimators are often used. Just as in the camera design process, 
care must be taken to correct for optical aberrations as well as to balance the color 
terms coming from the individual elements. The former criterion is an exercise in 
optimizing the shape and power of the optical elements, the latter criterion is an 
exercise in material selection and optical power for the individual optical elements. 

Catadioptric systems contain reflective as well as transmissive elements. The 
most common type of catadioptric system, as mentioned above, is the Schmidt 
camera or variations on the Schmidt such as the Houghton design, as used in the 
Hermes spectrograph* (Fig. 4) built for the GALAH (Galactic Archeology with 
Hermes) experiment.°? In the Schmidt family of designs,® the goal is to place an 
aperture at the center of curvature of the reflector in order to remove all off-axis 
aberrations and then design a transmissive corrector of one or more elements that 
corrects the spherical aberration over the usable wavelength range. 
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Fig. 4. The Hermes spectrograph, built for the GALAH (Galactic Archeology with Hermes) 
experiment, uses a reflective collimator of the Houghton design, which uses a single reflective 
optical element and two transmissive spherical aberration correctors. In addition, HERMES uses 
transmissive Volume Phase Holographic gratings (VPH), which allow the camera to be placed in 
close proximity to the grating, thereby minimizing the size of both the grating and the camera. 
See electronic edition for a color version of this figure. 


2.2. Cameras 


The next optical element in the path is the camera. The camera is harder to design 
than the collimator, primarily because the étendue of the camera is effectively larger 
than that of the collimator. The étendue is the product of the aperture A of the 
system and the solid angle 2 seen by the system. For any given wavelength the 
system étendue is constant or increases: AQ > constant. The reason that the effective 
étendue for the camera is larger than the collimator is that the dispersing element 
creates an effectively larger field of view for the camera to capture than the field 
of view that is seen by the collimator. In most modern telescope applications, the 
optical designer is required to demagnify the slit image, as modern telescopes form 
a large physical image of the seeing disk while camera pixels are kept small (for 
low dark current). The requirement to adequately sample the seeing disk with 2-3 
pixels on a side drives the design to demagnify the image scale onto the detector. 
Therefore, cameras generally have a shorter focal length and are faster than the 
collimator. 

As in the collimator, cameras can be reflective, refractive or catadioptric. Refrac- 
tive cameras can be all spherical or can contain aspherical elements. In the visi- 
ble, numerous material options exist that aid the optical designer with the color 
correction. In addition, single point diamond turning techniques exist that allow for 
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generalized aspheres to be turned on crystalline optics. There is a practical limit 
that exists on the size of the optic available to build into spectrograph cameras, 
which are dependent on optical blanks for the materials used, with maximum sizes 
of crystalline blanks being in the roughly 300-350 mm diameter range. 

Two-mirror designs include well-known telescope designs such as_ the 
Cassegrain, Ritchey—Chrétien and Dall-Kirkham designs.’ In general, two mirror 
designs have a larger field than single mirror designs, as they correct for spherical 
aberration as well as correcting for coma over some fraction of the field; hence, they 
are dominated by coma at the edge of the usable field. The Ritchey—Chrétien® is 
a subset of the Cassegrain design that formally corrects spherical aberration and 
coma over all fields, and hence it is dominated by astigmatism at the edge of the 
usable field. When used off-axis these designs allow for an unobscured beam. 

Multiple-mirror cameras like the three mirror anastigmat, such as that in 
NIRSPEC® (Fig. 5), have the advantage of having a well-corrected field and being 
achromatic. However, they are difficult to manufacture, difficult to assemble, and 
difficult to align, as the off-axis system has no centerline to boresight on. Using single 
point diamond turning techniques, it is possible to make these types of assemblies 
in metal, allowing for the mounting and alignment surfaces to be machined in. Care 
must be taken in this case to remove the diamond turning machine marks in order 
to reduce the scattered light in the system. 


NIRSPEC Optical Design 


| TMAT 


Collimator 
(OAPC) 


Fig. 5. The NIRSPEC optical design showing the Three-Mirror Anastigmat (TMA) camera. 
Multiple-mirror cameras like the three-mirror anastigmat have the advantage of having a well- 
corrected field and being achromatic. See electronic edition for a color version of this figure. 
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2.3. Detectors 


CCDs are the most popular detectors used in visible light spectrographs.* while 
mercury cadmium telluride (HgCdTe) or indium gallium arsenide (InGaAs) are 
typically used for the near infrared. It is beyond the scope of this chapter to delve 
into the physics of solid-state detectors; however, it is useful to understand the 
signal-to-noise calculation based on detector noise as well as the rest of the system- 
level noise sources. The signal-to-noise ratio is given by 


— Signal Rye 
SNR = — = - T 
Notse 
ain)? ‘ 
Ry ttRey tf Npixt (RN?)+(&*) NpixtDark-t pix 
~ (3) 
Readnoise in aperture f 
a Noise from dark 
\ (Re 2) current in aperture 


Noise from sky e- in aperture 


where f,t is the signal, t is the integration time, Rsky is the sky background per 
pixel, npix is the number of pixels in the analysis aperture, RN is the detector 
read noise, Dark is the dark current in the analysis aperture, and all variables are 
measured in units of electrons. 


2.4. Dispersive Elements 


Dispersive elements are any kind of optical elements that impart an angular devia- 
tion to the beam as a function of wavelength. The most common dispersive element 
in low and medium resolution spectrographs is either a diffraction grating, a prism, 
or a combination of the two in the form of a grism!° (Fig. 6). 
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Fig. 6. Ray trace for a grism. Two offsetting wedge prisms sandwich a transmission grating 
(located at the center of the assembly) that disperses the light. Dispersion from the two prism 
surfaces cancels, and the overall beam path is undeflected (except for deflection caused by the 
grating dispersion). 


“See Chapter 1 of Volume 2 and Chapter 9 of this Volume for more details. 
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Gratings are either transmissive or reflective. Simple reflective gratings are 
either ruled directly or replicated from a master ruling. Transmissive gratings have 
the advantage that they allow the designer to put the camera as close as possible 
to the dispersing element, reducing the camera size and speed. This is because the 
designer will typically place the system stop at the grating in order to minimize the 
size and hence the cost of the grating. This is effective because the system stop is 
typically the smallest beam diameter in the optical system and the grating is the 
most expensive single optical element. In reflective grating designs, the collimator 
images the system stop onto the grating and the camera is placed far enough away 
from the grating so that there is room for the beam to physically pass around the 
collimator. This makes for a camera mouth that is larger than the telescope pupil. 
Having an external pupil in front of the camera also complicates the design and 
optimization process for the camera. In contrast, a transmissive design such as that 
for Hermes (Fig. 4) allows the camera to be placed in close proximity to the grating, 
thereby minimizing the size of both the grating and the camera. 

Gratings work by coherent addition of the optical radiation scattered by each of 
the facets. They are characterized by the grating equation (Fig. 7) and by the blaze 
angle. The blaze angle for a given wavelength is the angle at which the specular 
reflection off each facet and the coherent diffraction angle for a given wavelength are 
equal. Figure 7 shows a typical reflection grating geometry, along with the derivation 
(from the grating equation) of the formula for grating dispersion, 63/6A. It can be 
shown that the maximum resolving power provided by a dispersing element is given 
by the optical time delay in the beam produced by the optical element divided by 
the period of the light. 


3. Spectrograph Performance 


For most applications, the two most important performance criteria for a low- or 
medium-resolution spectrographs are resolving power and throughput or sensitivity. 
In the following section, we will address the fundamental constraints governing these 
performance indicators. 


3.1. Resolving Power 


Resolving power is the ability of the spectrograph system to resolve light of two 
different wavelengths, A; and Az, separated by AX. Using a conservation of étendue 
argument, one can derive the following governing equation! that relates the spec- 
trograph resolving power to the spectrograph and telescope parameters. 


Deon -2 tan Oplaze 
Ro = ———J— 4 
é oe. (4) 
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Grating equation: 


nA = d(sin(@) + sin(B)) , 
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Fig. 7. Schematic diagram of a grating showing the derivation of the grating dispersion equation 
as well as the sign conventions for the grating angles. Light traveling from an optical element, d1, 
arrives at the grating at an angle, a, relative to the grating normal, n. The light is diffracted at 
an angle, 3, relative to the grating normal. Inset (a) shows the geometry relative to the grating 
face G and grating normal GN for a reflection grating (as shown in the main figure), while inset 
(b) shows the geometry for a transmission grating. The dispersion of the grating is defined as the 
change in angle as a function of the change in wavelength, 63/5. Individual diffracting elements 
(grooves) are separated by a distance d. 


where R = /6. is the spectral resolving power, ¢ = slit width in radians, 6 is the 
grating angle, Dy.) is the telescope diameter and Deo is the diameter of the colli- 
mated beam within the spectrograph. Deo sets the scale size for the spectrograph, 
as the entire optical system essentially expands or contracts as the designer changes 
this parameter. 

Equation (4) shows that the product of the slit width and the resolving power 
is a dimensionless constant. In particular, in the Littrow condition (a = 3 = Opiaze) 
that constant is given by the optical path length difference from the top of the 
optical beam to the bottom of the optical beam as it passes through and past the 
grating, divided by the telescope diameter. 

To understand the physical meaning of this equation let us replace the slit width 
with the two limiting cases for astronomical observations. Those two cases are the 
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diffraction-limited case, in which the slit width is equal to A/Die1, and the seeing- 
limited case, in which the slit width is equal to \/ro, where ro is the well-known 
Fried parameter. 


Diff-limited 


dy 
o= zs me | ®= une, (5) 


A 
a" d 
p ==) R=* ; 2tan(6,) (6) 


tel 


The Fried parameter is a measure of the coherent patch size in the atmosphere. 
It is the physical width, perpendicular to the direction of travel, over which the 
optical beam passing through the atmosphere is phase-coherent. 

In Fig. 8, we see a schematic diagram of the diffraction-limited case. In this 
situation, the spectrograph is operating at the grating resolving power, which means 


diffraction-limited case 


diameter = d, 


OPD= d,(sin(@) + sin()) = d,2 tan(9, ) 


Fig. 8. The optical path length difference (OPD) available in a fully coherent, diffraction-limited 
beam. Light traveling from an optical element, d;, arrives at the grating at an angle, a, relative to 
the grating normal, n. The light is diffracted at an angle, 3, relative to the grating normal. In this 
case, the light is fully phase-coherent over the full aperture of the grating. A physical interpretation 
of this situation is that if one measures the phase of the beam at the topmost ray one can predict 
absolutely the phase of the bottommost ray at the point defined by a perpendicular dropped from 
the topmost ray to the bottommost ray. 


bSee Chapter 13 of Volume 2 for more details. 
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seeing-limited case 
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Fig. 9. The optical path length available in a partially coherent, seeing-limited beam. Light 
traveling from the right, passing through a phase-coherent patch size, rod; /Dte1, arrives at the 
grating at an angle, a, relative to the grating normal, n. The light is diffracted at an angle, (, 
relative to the grating normal. In this case, the light is fully phase-coherent over the sub-aperture of 
the grating given by rod1/Dte1. A physical interpretation of this situation is that if one measures the 
phase of the beam at the topmost ray one cannot predict absolutely the phase of the bottommost 
ray at the point defined by a perpendicular dropped from the topmost ray to the bottommost ray. 
However, if one measures the phase of the ray passing through the top of the coherent patch, one 
can fully predict the phase of the ray passing through the bottom of the coherent patch. 


that the beam is fully phase-coherent across the face of the grating so that rays at 
the top of the grating are interfering with rays from the bottom of the grating in 
a coherent way. Equation (5) shows that the grating resolving power in this case 
can be physically interpreted as equal to the optical path length difference available 
to the coherent beam divided by the wavelength of light. The resolving power is 
essentially the number of wavelengths that fit in the interfering optical path. 

Figure 9 shows the case for a seeing-limited observation. In this case, the resolv- 
ing power available to the system has been scaled down by the Fried parameter 
divided by the diameter of the telescope (Eq. (6)). The way to physically interpret 
this result is that the optical beam is only coherent over a small fraction of the 
pupil. In this case, the coherence patch size in the pupil is ro Deon /Dtei. The resolv- 
ing power is now set by the optical path difference in the coherent patch and is 
effectively the number of wavelengths that fit into this path difference. 

We have shown this result for a reflection grating, but it can be generalized to 
any dispersive element, including transmission gratings, prisms, and grisms. 

With this result in mind, one way to think about the function of the spectro- 
graph and the job of the spectrograph designer is to provide in the optical system 
as much optical path length difference as possible to be available for the coherent 
portion of the beam. The way that this is done is by having a long optical path 
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length available in the spectrograph. This can be achieved by: 


e Having a large spectrograph with a large collimated beam diameter. In the seeing- 
limited case, as shown by Eq. (6), a larger telescope at fixed resolving power or 
a larger resolving power at fixed telescope diameter requires a proportionally 
larger beam diameter in the telescope. This is why giant telescopes require giant 
spectrographs! 

e Having a steep grating angle (high blaze angle) provides higher resolving power. 
Gratings with the highest blaze angle are echelle gratings. They are characterized 
by their R number (not to be confused with resolving power!). The R number is 
simply the tangent of the blaze angle, so an R2 grating has tan(@piaze) = 2. 

e Since the determining factor is optical path length difference inside the spectro- 
graph and not the physical path length difference, another way to increase the 
resolving power is to increase the index of retraction in the area that is being 
interfered. Originally this was done by filling the grating area with a high index 
oil, i.e. immersing the grating in oil, hence the name immersion grating. Current 
efforts at immersion gratings include using silicon in the infrared as the grating 
material and producing the interference inside of the grating.!? 


3.2. Throughput/Sensitivity 


Sensitivity in an optical spectrograph, as in any optical system, is set by the amount 
of energy transmitted through the system divided by the total noise of the system. 
Primary signal losses in an astronomical spectrograph are due to reflection at the 
Fresnel interfaces of the optics, bulk absorption by the optics, and diffractive losses 
at the grating, with the latter being typically the greatest source of loss. Noise 
sources in the spectrograph are primarily due to stray light, scattered light, and 
detector noise. 

In order to estimate the throughput of a spectrograph one typically multi- 
plies the Fresnel reflection losses at all interfaces based on calculated or modeled 
estimates of the anti-reflective coating’s performance. For example, a good anti- 
reflective coating will transmit of order 99.5% of the light. A modern state-of-the- 
art medium resolution spectrograph can have as many as 10 optical elements in the 
camera and six optical elements in the collimator, so the transmission due to the 
Fresnel losses at the optical surfaces, considering two Fresnel surfaces per optic, is 
equal to T = (0.995)3? = 85%. 

The diffractive loss of the grating is difficult to model. To first order, Fourier 
analysis shows that the transmission of the diffraction grating is the Fourier trans- 
form of a single groove, so for a ruled grating the transmission function of a single 
groove is a top-hat function, whose Fourier Transform is a sinc function (sin(x)/z). 
Therefore, the theoretical envelope for this type of grating is a sinc function. Real 
grating performance departs significantly from the theoretical models due to many 
factors, such as imperfections in the groove replication, dirt and dust on the grating, 
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surface irregularities and index inhomogeneities (in the case of transmission grat- 
ings). For this reason, different replicated gratings will show different performance, 
with some being a “hot” grating and some performing much more poorly. In order 
to get the full performance, the grating must be modeled using Rigorous Coupled- 
Wave Analysis (RCWA). This can be done using a commercial program such as 
GSolver (http://www.gsolver.com) or by writing one’s own code. Nonetheless, the 
best method for an existing replicated grating is using measured data for previous 
replications. 

The most popular type of transmissive grating currently used in astronomy is 
the volume-phase holographic (VPH) grating.? Transmission of VPH gratings can 
be modeled by the Kogelnik approximation,!? which is valid for shallow angles of 
incidence. In order to estimate the full grating performance, one must also apply 
RCWA using a modeling program such as GSolver.!° 

The Kogelnik approximation is valid under certain conditions, of large angle of 


incidence or when: 
. 10An 
sin(ag) > (7) 


where ag is the angle of incidence under the Bragg condition, n is the index of 
refraction of the VPH grating, and An is the modulation range of the index of 
refraction of the VPH grating. Under those conditions the diffraction efficiency, 7, 
given by Kolgelnik is 


1. 4{ awAnd 1.4] aAnd 
= ——— = ee 9 
oe cae — ee = oe as)| , ie) 


where d is the thickness of the VPH modulation layer. 
VPH gratings offer several advantages over reflection (ruled or replicated) grat- 
ings, including: 


(1) Close to 100% diffraction efficiency near the design wavelength. 

(2) Transmission gratings allow the designer to place the camera closer to the grat- 
ing and hence the pupil. This reduces the demands on the camera and typically 
allows for a slower camera, easier fabrication and better camera performance. 

(3) By adjusting the incidence angle, the wavelength at the maximum efficiency 
can be adjusted. 

(4) Since the gratings are produced via optical interference they can generally have 
a line density significantly higher than the maximum for ruled gratings, allowing 
more options for the spectrograph designer. 

(5) The grating is more robust than other types, as it is located between glass 
substrates, which can be cleaned and have anti-reflection (AR) coatings. Fur- 
thermore, in some cases the glass substrates themselves can be post-polished 
after the gratings have been manufactured, in order to remove optical inhomo- 


geneities that contribute to wavefront errors.4 
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(6) Large grating sizes are feasible, with gratings over 500mm in length having 
been fabricated.* 


4. Design Example 


The final section of this chapter is a short design example. This design example will 
take set of specific science requirements and engineering constraints, and demon- 
strate the process to produce a first-order optical design for the spectrograph. 

The output of the first-order design will be the focal lengths and fields-of-view 
for the collimator and the camera; the collimated beam diameter, setting the effec- 
tive scale size for the spectrograph; and the grating first-order parameters. It can 
be used to constrain the detailed design of the collimator and camera and provide 
the detailed requirements to the optical designer to optimize those components 
using a computer optimization/design program. In addition, it is a starting point 
for selecting an off-the-shelf or custom grating. 

For our example, we will consider that we have the following design require- 
ments: 


e An existing telescope at a known site with a known expected seeing disk size. 
This includes the aperture size and f-number of the telescope, which we take to 
be: 10-meter aperture operating at f /15, with 1.0 arcsecond worst quartile seeing. 
(Which quartile you design for can be somewhat contentious, but we will design 
for the 4th quartile.) 

e An existing choice of detector, including pixel size and number of pixels. We will 
design for: a 2K by 4K CCD with 15 um square pizels. 

e The resolving power required by the spectrometer: The spectrograph will operate 
at a minimum resolving power of R =8&000. 

e Operating wavelength will be: centered on 500 nm. 


When designing a spectrograph many different constraints exist and the prob- 
lem is often attacked in different ways. In our case, we have selected a pixel size 
and the telescope; therefore, we know that we must image the seeing desk onto the 
detector. Normally the image will be (de)magnified onto 2-3 pixels (at or below 
the Nyquist limit). In our case, we will image onto three pixels. This criterion sets 
the ratio of the focal length of the collimator to the camera, as the two of them 
make an imaging pair. 

Ratio of camera to collimator focal length: 


Peelers = (3 : 15m ‘ 10~®m) 


, i 10—® radians 
F collimator (10m -15-5- ——“alcsee 1 arcsec) 


M= = 0.06. (9) 


We need to (de)magnify the image by a factor of 0.06. 
In addition, we already know the f-number of the collimator, as it must match 
that of the telescope: in our case f/15. In the case of fiber spectroscopy, one would 
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condition the output beam from the telescope to match the Numerical Aperture 
(NA) of the fiber and then use a collimator that matches the NA of the fiber with 
some amount of overhead for focal ratio degradation.© Even in this case the ratio of 
the camera to collimator focal length would be set by the magnification requirement. 

In order to determine the actual focal length of the collimator and camera, one 
needs to determine the beam diameter within the spectrograph. Recall from the 
section above that the resolving power is set by the optical path length difference 
available within the beam. This gives the choice of either a large diameter beam 
and a grating that is used at a shallow blaze angle, or a smaller diameter beam 
used at a steeper grating blaze angle. Increasing the diameter of the beam increases 
the cost of the grating as well as all of the optics inside the spectrograph, so care 
should be taken to keep it reasonably small. For that reason, we will start with an 
R2 grating for our spectrograph. 

Using the resolving power requirements and seeing disk diameter from above 
and setting the tangent of the blaze angle equal to 2 we can rearrange Eq. (4) to 
calculate the beam diameter: 


Dia: RO _ (10m- 8,000 - 5 - 107° radians) 


Deo ad — 
: 2 tan Ablaze 4 


=0.10m=10cm. (10) 
With the beam diameter we can now determine the first-order parameters for 
the optics: 


The collimator: The collimator is f/15 (to match the telescope), so its focal length 
is 10cm x 15 = 150cm. 


The camera: The camera focal length is M x Feo = 0.06 x 150 = 9cm. The camera 
f/# = 10/9 = 1.11, which will be challenging. The camera half angle (field of 


view) is 


(0.5) 15 pum 4000? + 20007 


cameras 90,000 pm. 


6 1 = arctan ( = 20.4 degrees, (11) 


very challenging with a f/1.1 camera! 


Our first attempt looking at an R2 grating produced a camera that will be very 
challenging; however, with a beam diameter of only 10 cm. We can probably go 
with a larger diameter beam, making the camera design easier. Thus, we repeat the 
exercise for an R1 grating, tan(@piaze) = 1 or Obiaze = 45 degrees. 


Die: RO _ (10m- 8,000 - 5 - 107° radians) 


Dea =—_———_ = 
ot 2 tan Oplaze 2 


=0.20m=20cm. (12) 


“See Chapter 15 of this Volume for more on focal ratio degradation. 
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With the beam diameter we can now determine the first-order parameters for 
the optics: 


The collimator: Again, the collimator is f/15 (to match the telescope) so its focal 
length is 20cm x 15 = 300cm. 


The camera: The camera focal length is M x Foon = 0.06 x 300 = 18cm. The camera 
f/# = 20/18 = 1.11, which will be challenging. The camera half angle is 


: cen (0.5) 15 pam 4000? + 20007 
=> ar n lal 
camerag 180,000 pm 


= 10.5 degrees (13) 
still challenging with a f/1.1 camera, but likely doable with multiple lenses and 
multiple aspheres. 

We then use the grating equation from above to calculate the grating parameters 
that allow us to diffract the central wavelength to the center of the detector using 
our grating in first order. This results in the following parameters: 


nA = 2dsin(Oplaze)- (14) 


For n = 1, A = 500nm, we obtain 1/d = 2800 lines/mm blazed at 45 degrees. 
At this point we can attempt to find a grating in the catalog that matches these 
parameters, or develop a specification for a custom grating. In parallel, the next 
step would be to use a rigorous coupled-wave analysis program to calculate the 
expected diffraction efficiency for the grating. 

We now have the first-order design for our spectrograph. This information can 
be provided to a lens design program such as Zmax to optimize individual compo- 
nents and then start the process of vendor discussions to see if they can actually be 
made. 
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What is the difference between an “ordinary” spectrograph and a Doppler spec- 
trograph? What makes a Doppler spectrograph become “ultra” precise? The goal 
of this chapter is to answer these two questions, although we don’t claim to pro- 
vide the ultimate solutions. We shall just discuss basic principles and some design 
concepts that must be followed to achieve the best possible Doppler precision. We 
will not miss the occasion to describe current limitations and future perspectives 
of Doppler velocimetry. 


1. Introduction 


The Doppler measurement consists in determining the wavelength of an identified 
spectral line and comparing it with the theoretical value it would have when trans- 
ferred into the solar-system’s rest frame. The Doppler equation links the measure- 
ment to the theoretical wavelength via the relative velocity vector, finally delivering 
the projection of this vector in the direction of the line of sight (radial velocity). In 
order to increase the precision, an average over several thousands of spectral lines 
is computed. It should be noted, however, that the radial-velocity measurement is 
affected by several potential error sources that have been discussed extensively.” 
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instrumental errors, 
4,12 and stellar “noise 
monly referred to as stellar jitter. The term stellar jitter masks various stellar causes 
that produce radial-velocity effects at all timescales and of different magnitude. 
The discussion of all these effects lies beyond the scope of our review. Nevertheless, 
we point out that stellar jitter is probably the strongest limitation for Doppler 


The main error sources are: photon noise, 
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illumination effects, spectral contamination, om- 
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velocimetry when aiming at sub-meter-per-second precision. In this chapter, we 
shall instead focus on all the aspects that make a Doppler spectrograph capable of 
“ultra-high precision” , i.e. we do not consider all the effects linked to the imperfect 
source, but only those dictated by the measurement. For a list of present and future 
spectro-velocimeters, we refer to Ref. 21. 


2. The Fellgett (Dis-)advantage Applied to Spectrographs 


For the sake of simplicity, we will restrict our discussion to dispersive spectro- 
graphs. This choice is motivated by two arguments. Firstly, nearly all known radial- 
velocity exoplanets have been discovered or characterized by using echelle spec- 
trographs. Secondly, dispersive spectrographs have, at given spectral resolution, a 
unique advantage compared to multiplexing systems (e.g. Fourier Transform Spec- 
trographs (FTS) or monochromators (e.g. Fabry—Pérots): the Fellgett advantage.?** 
This principle is commonly used to explain the advantage of an FTS with respect to 
a monochromator, but it applies only if the measurement is detector noise limited 
and is particularly effective for emission spectra. For shot-noise (photon-noise)- 
limited measurements, however, and in particular in the case of absorption spectra, 
this principle turns actually into a disadvantage. The real and fundamental advan- 
tage of a spectrograph with respect to both the FTS and monochromator is the 
number of simultaneous detectors (pixels): while with a monochromator (or an FTS 
in the photon-noise limit) with N spectral channels we will need a total integration 
time Tic = N-t to obtain a given signal-to-noise ratio (SNR) per spectral channel, 
with a spectrograph having the same number of pixels N as spectral channels, the 
total integration time will be identical to the integration time for each individual 
channel, i.e. Tsp = t. Considering the number of spectral channels used (typically 
300,000 channels for a spectrograph of resolving power R = 100,000 covering the 
visible wavelength domain), we can now appreciate the reasons that make the echelle 
spectrograph so time-efficient and so suitable for the stellar Doppler measurements. 
In the era of very low-noise CCD and CMOS array detectors, echelle spectrographs 
remain without competition. The conceptual layout of a simple fiber-fed, cross dis- 
persed spectrograph is shown in Fig. 1. 


3. Stability, Repeatability, Precision and Accuracy 


There is quite some confusion when talking about the ability of measuring radial 
velocities, and its understanding has a nonnegligible subjective aspect. For this 
reason, we shall define below some terms as they will be used hereafter, in the 
awareness that their definition is not unique. 


Also known as the multiplex advantage. 
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Fig. 1. Conceptual layout of a cross-dispersed echelle spectrograph with echelle grating in Littrow 
condition and (asymmetric) “white pupil” mount. The shorter focal length of the transfer colli- 
mator compared to the main collimator produces a collimated beam of reduced diameter, which 
allows in turn a reduction in the sizes of the cross disperser and the camera lens. 


3.1. Stability 


The word “stability” is probably the least exact term in this context. It is not related 
to a physical measurement. It shall be understood hereafter as a characteristic of 
the measuring instrument, and describes at what level the signal of a measuring 
instrument remains constant over time. In the case of a spectrograph, the stability 
refers to the recorded position of spectral reference lines projected on the detector 
as a function of time. Of course, the stability will depend on the instrument (tem- 
perature, pressure, materials, aging, radiation effects, etc.), the detector and its 
pixels, the read-out electronics and the data-reduction software. What the observer 
finally will see is the digitized signal strength (e.g. the number of photo-electrons) 
as a function of position (e.g. the pixel number) and deduce by eye or by software 
whether the spectrum has changed, although even the “change” criterium might be 
somewhat subjective. 

One may ask why the term stability is used and considered important. The 
reason is simple: As long as the measurement setup is stable, it must be possible to 
make the instrument repeatable and calibratable. 


3.2. Repeatability 


Repeatability is clearly a measurement characteristic. It denotes the ability of mea- 
suring exactly the same signal (value, spectrum, etc.) every time the experiment 
is repeated under identical conditions. If a measurement is perfectly repeatable, all 
the measurements will deliver the same results within the measurement precision, 
i.e. the measurements are identical in statistical terms, where the statistics is given 
by fundamental physical limits, such as the shot noise. Repeatability is a measure 


bKnown and stable spectral lines of a reference (calibration) source. 
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for the stability of the instrumental setup and can directly be compared with the 
measurement precision. 


3.3. Precision 


Since the word precision is commonly used in everyday language, its meaning can be 
ambiguous. What is proposed hereafter is to restrict it to the 1-sigma value of the 
distribution of a measurement, for the case that this latter is limited by fundamental 
physics. In other terms, while the repeatability can be considered as the inverse of 
the “external” measurement error, the precision rather represents the inverse of 
the “internal” measurement error. The former contains any error source, including 
instrumental; the latter, on the other hand, is free from instrumental errors. In the 
limit of a perfect instrument, repeatability will equal precision. 


3.4. Accuracy 


While repeatability and precision are both related to the width of the distribution 
of a measured value, the accuracy refers to its (absolute) mean value in the limit 
of infinite precision. The measurement is supposed to provide an absolute number 
that is close to reality, where the reality can only be given by an absolute reference 
(standard) and is obtained through the calibration process. 


3.5. Application to Doppler Velocimeters 


In the case of exoplanet searches, Doppler velocimetry aims at obtaining the most 
repeatable measurement over the longest possible time span, or a time span covering 
at least a few orbital periods of the observed planets. Since the orbital period is not 
known a priori, the second part of the statement is meaningless, in the sense that 
one has to ensure repeatability over any timescale. How can this be achieved? 


1. First, the instrument must be calibratable and calibrated. This means that at any 
moment we must be able to assign to any spectral bin (in our case an extracted 
detector pixel) a wavelength/frequency and a signal conversion function (digital 
to energy/power). This is obtained by calibration, i.e. observing spectral and 
flux standards with the instrument. The accuracy will be determined by biases 
of the standards or the measurement, while the repeatability will be determined 
by measurement errors. The obtained calibration applies then until the next 
calibration is performed. At this point, the stability aspect of the instrument 
enters: as long as the instrument is stable, the calibration remains valid, and the 
repeatability is not compromised. 

2. However, the instrumental stability can by definition not be guaranteed at any 
level and over any timescale. The instrument will, as a matter of fact, evolve due 
to thermal, mechanical, optical and many other reasons. To cope with this, either 
the calibration is repeated on timescales that allow only for instrumental drifts 
smaller than the calibration repeatability, or there is a means to measure the 
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instrumental drifts, i.e. the change in sensitivity and wavelength of each pixel 
between the last calibration and the scientific observation, and to correct the 
calibration accordingly. Again, the drift measurement must be at least as precise 
and repeatable as the wavelength calibration, if no additional error is to be added 
to the measurement. 

3. Finally, all the steps, i.e. calibration, drift measurement and scientific obser- 
vation, are ideally limited by fundamental physics. Since we are dealing with 
photons, the fundamental reference for the observable’s measurement error is set 
by shot noise, i.e. the statistical noise related to the fact that photon emission 
and detection is quantified. 


3.6. Wavelength Calibration 
3.6.1. The Wavelength Solution 


Wavelength calibration is a fundamental step of spectroscopy in general and astro- 
nomical spectroscopy in particular. It provides the wavelength scale for the recorded 
astrophysical spectrum. The wavelength scale, hereafter also called wavelength solu- 
tion, is obtained by feeding the spectrograph with a spectral reference, i.e. a light 
source providing spectral lines of well-known wavelength. 

On a raw frame the wavelength solution would be described by a function 
\(a,m) that provides the wavelength \ as a function of the extracted pixel x in 
the main-dispersion direction and the diffraction order m of the echelle grating. 
Figure 2 shows typical traces of the raw echelle spectrum on the detector. They are 
curved and are several pixels wide in the cross-dispersion direction, with all of these 
“spatial” pixels carrying the same wavelength. Therefore, typical data reduction 
processes first “extract” the spectra by converting them into several 1D spectra, 
I(a,m), where I is the number of photoelectrons per resolution bin (extracted 
pixel). For a general discussion of the wavelength-calibration concept and known 
issues and limitations, see Ref. 22 and references therein. 

Optical spectrographs must be calibrated using standard light sources with 
accurately known wavelengths. The emission lines from hollow cathode lamps (HCL) 
or simple gas discharge tubes are reliable standards for wavelength calibration, as 
these emission lines are intrinsic to the source and arise from atomic transition. 
Thorium—Argon (Th-Ar) lamps, for instance, have large number of these lines. 
They are narrow, dense, cover the entire visible wavelength range and occur, last 
but not least, at well-known laboratory wavelengths.??- 7° Once the spectra have 
been extracted, the position of each spectral line (#;,m) in all the orders m can 
be determined, e.g. by fitting the individual lines with a Gaussian profile. One 
obtains a list of lines i with known wavelength A;, to which a position (#;,m) can 
be associated. The wavelength solution is then the function f which links the line 
positions with the wavelength: \; = f(a;, m). Evidently, both the wavelengths 
and the line position are affected by measurement error or systematics. The best 
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Fig. 2. Geometry of the orders in an echelle spectrograph. The nearly horizontal lines trace the 
diffraction orders to their free spectral range (FSR), whereas the red dots represent the position of 
thorium lines used for calibration. The box shows the physical size of the CCD. Figure extracted 
from Ref. 24. 


calibration is usually obtained by solving this set of equations in the sense of x? 
minimization, while making sure that a minimum number of free parameters is 
used .28-30 


3.6.2. Spectral Sources 


Stable, well-characterized spectral references with a large number of spectral lines 
over the largest possible wavelength range is mandatory for having stellar spectrum 
with a precise wavelength calibration. The calibrator determines the precision and 
the accuracy in the measurements of the stellar features shift. 

Thorium—Argon hollow cathode lamps show rich density lines which cover a 
large wavelength range. The precise measurement of their line position is impacted 
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by numerous blends and by the large dynamic range in line intensities that the 
Thorium—Argon spectra show. In addition, they are affected by ageing effects, e.g. 
changes in line intensities and small wavelength shifts produced by the slow pressure 
variations in the lamps.” The most sensitive to these effects are the Argon lines, 
which could drift by several tens of m s~! during the lifetime of a lamp. During 
wavelength calibration process, it is mandatory to avoid these lines. Thorium lines 
drift by few m s~!, instead, and they can be corrected if we know the drift of 
Argon lines with respect to Thorium lines and the sensitivity ratio between them.?’ 
HARPS has demonstrated that the best stability achieved with the Thorium—Argon 
lamps as a wavelength calibrator is at a level of 30 cm s~!.?” However, for several 
years such lamps have not been manufactured anymore due to the unavailability of 
pure metallic Thorium for to hollow-cathode lamps to the manufacturer. Instead, 
pure Thorium oxide is used. Unfortunately, Thorium oxide introduces impurities 
into current Thorium hollow-cathode lamps.*! Thorium oxide spectra show unde- 
sirable spectral features, a sort of “grass” of many unidentified, blended emission 
lines. Wavelength calibration of high-accuracy radial-velocity spectrographs is com- 
promised by these features. Reference 32 explored Uranium as an alternative for 
the established Thorium cathodes in the wavelength range from 500 to 1000 nm. 
The Uranium cathode provides a factor of about two more lines than the Thorium 
cathode in this wavelength range. The spectrum of the Neon filling gas shows fewer 
strong lines and is therefore preferred with respect to Argon. The analysis shows 
that the Uranium—Neon and Uranium—Argon lamps have performance for wave- 
length calibration over the short term comparable to Thorium—Argon. However, no 
long-term, high-precision RV result has yet been reported. 

Reference 33 proposed for the first time to use a laser frequency comb*+ *6 
for the accurate calibration of astronomical spectrographs. The spectrum of such 
laser frequency combs is made of equally spaced emission lines whose frequencies 
fn are determined by the formula f, = fo-+nfrep, Where fp is the so-called carrier- 
envelope offset frequency and frep the repetition frequency of the ultra-short pulsed 
laser emission, both related directly to an atomic clock using well-established elec- 
tronic phase locking techniques. Eventually, by determining and fixing accurately 
fo and frep, all optical frequencies of the LFC will have the accuracy and long-term 
repeatability of the atomic clock, i.e. a relative error smaller than 107? (equivalent 


to 1 ems! 


or less when expressed in terms of radial velocities). Laser combs 
represent clearly the best choice in the near future for the calibration of spec- 
trographs dedicated to ultra-high-precision radial velocity measurements. Tens of 
groups internationally are developing so-called astrocombs.?”°! The rapid evolu- 
tion of the technology makes us confident that these systems can reach a level of 
reliability required by astronomical observatories. In the meantime, reliable, cheaper 
and bandwidth covering alternatives are still required. 

The spectrum of a Fabry—Pérot (FP) is made of quasi-equally spaced (in fre- 
quency) peaks that cover a large wavelength domain with homogeneous intensi- 
ties, and is characterized by a dispersion extremely constant over time. In recent 
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years, many Fabry—Pérot calibration systems were developed for instruments like 
CORALIE and HARPS,°? HERMES,°? SPIROU®™* and CARMENES.®°°° The 
system currently in use on the HARPS spectrograph has performed well and has 
completely replaced the Thorium—Argon spectral lamp for instrumental drift mea- 
surements. Indeed, the use of the FP systems help in prolonging the lifetime of 
“good” thorium lamps that are no longer on the market, which can be used for 
daily wavelength calibration only (operational time of the order of 15 minutes per 
day). Given the “opto-mechanical” and passive nature of the Fabry—Pérot cavity, 
the spectral lines are not perfectly equally spaced and cannot be guaranteed to 
remain perfectly stable with time, although their relative wavelengths, which are 
perfectly linked together by one single physical parameter, the cavity spacing, are 
not supposed to change with time. Therefore, despite few drawbacks, a Fabry—Pérot 
etalon can also be used as a wavelength calibrator, reaching a precision at the level of 
cms~! if it is actively referenced to another source of stable lines.°® °° Alternatively, 
the HARPS experience demonstrates how a passive Fabry—Pérot can efficiently be 
locked to a Thorium—Argon lamp spectrum.?*3° As we will see, the Fabry—Pérot 
(FP) etalons are good alternatives that produce regularly spaced calibration lines 
covering the entire spectral range of the spectrograph. Due to the absence of inex- 
pensive laser-frequency combs that cover the entire spectral range, this model can 
be a focus for the near future. 


3.7. Tracking Instrumental Drifts 


Two methods of tracking the instrumental-profile changes have successfully been 
applied in the past. The first is to superimpose an absorption spectrum of a refer- 
ence gas cell”: °9:©° on the stellar spectrum, such that the instrumental profile (IP) 
is continuously measured. This so-called self-calibration technique is particularly 
useful and effective in spectrographs with varying instrument profile, as in the case 
of slit spectrographs. The disadvantage of this technique is the restricted bandwidth 
of the gas-cell spectrum, the loss of efficiency due to absorption in the light path, 
and the necessity for a sophisticated deconvolution process in order to recover the 
stellar spectrum and thus the radial velocity. This latter step requires the intro- 
duction of many additional parameters for spectral modeling. In order to obtain a 
given precision, higher signal-to-noise spectra must be acquired. 

The second method, the so-called simultaneous reference technique, 
ceptually opposite. It assumes a stabilized IP that does not change between two 
wavelength calibrations of the spectrograph, such that the determined relation 
between the detector pixel and the wavelength remains valid over these timescales 
(typically a night). A second channel carrying a spectral reference is continuously 
fed to the spectrograph to monitor and correct for potential instrumental drifts or 
IP changes (Fig. 3). It must be guaranteed, however, that the changes suffered by 
the scientific and the reference channels are identical over timescales of one observ- 
ing night. Therefore, the whole design of the instrument must be optimized for 
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Fig. 3. Portions of two neighboring echelle orders of a scientific frame recorded by the ESPRESSO 
spectrograph. Each order shows the spectra of two fibers, the target fiber (stellar spectrum) and 
the reference fiber (spectral reference source, Fabry—Pérot). Each spectrum is split into two “slices” 
produced by the pupil slicer in the ESPRESSO design. The reference fiber is always illuminated 
with the spectral reference, such that potential drifts of the spectrograph can be measured and 
eventually corrected for. 


stability, requiring fiber feed and light scrambling, as well as pressure, mechanical, 
thermal and optical stability. The effort is compensated by an unrestricted spectral 
bandwidth and the acquisition of an “uncontaminated” scientific spectrum. 

Although, in the case of the self-calibration technique, the instrument profile is 
supposed to be recoverable by deconvolution, there seems to be general agreement 
that low-order instrument profile changes must be avoided in any case and that a 
stable instrument will eventually deliver more precise measurements. There is also 
agreement on the need for better calibration sources. The laser-frequency comb, 
when available at full potential, will provide the desired tracking precision. In the 
meantime, alternative sources for simultaneous reference are being developed, as for 
example the previously mentioned passive Fabry—Pérot cavities or actively stabilized 
Fabry—Pérot systems. 


4. Fundamental Precision Limits 


The radial velocity observable is the measurement of the line center expressed in 
pixels, wavelength or radial-velocity units. Let us assume a typical spectral absorp- 
tion line of Gaussian shape, as sketched in Fig. 4, with the flux per unit wavelength, 
I(X), being Jp in the local continuum. The line center, A;, is then the barycenter 
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Fig. 4. Conceptual description of an absorption line. 


measurement of the line; i.e. Ay = [A+ (Jo — I(A)) dA, from which we could derive 
through a full error analysis the measurement error as a function of the various 
error sources. In order to simplify our task, we instead follow another path, which 
leads nevertheless to an equivalent result. In fact, we can say that the line center is 
determined by the wavelength for which AS = Sp — Sg = 0, being the difference 
of the integrated flux enclosed by the line on the blue and the red side of the line 
center, respectively. Since this difference AS is affected by various measurement 
errors (e.g. photon noise, read-out noise), the wavelength for which it is zero will 
be different from the “real” line center by a quantity €, = Am — Arn determined by 
the equation: 


Nas = 2+ €)-¢€-Lo, (1) 


where Nag is the error on the measurement AS and c is the line contrast (or relative 
depth). The right-hand side of the relation is obtained from computing the variation 
of the flux difference AS at the line center for an equivalent displacement ¢) (see 
Fig. 4). 

For symmetric lines, we have Sp & Sp = $/2, where S' represents the full integral 
over the absorption line, so that we can write 


Nase = NB, + N3,% 2NB p= N32 (2) 
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if we assume a symmetric line. We finally obtain 
2 2 
N2 = (2-€,-e-1p)°. (3) 


In practice, €) will be determined by the measurement noise, Ng, given by the total 
noise inside the integration zone over which we effectively measure S. For practical 
reasons, the observable cannot be integrated to infinity, but must be restricted to 
the zone in which the line signal is still significant. We choose this range to be 
—o to +o, where oy is the full-width at half maximum (FWHM) of the line, and 
is simply referred to as line width hereafter. All the noise contributions shall be 
computed for this range. The total noise is the square root of the quadratic sum of 
all contributions, supposing that they all follow Poisson statistics, i.e.: 


Ng = 5_N}. (4) 


In the following, we will provide the formulas for the various noise components 
in order to determine the full error ¢, on the measured line center. 


4.1. Photon Noise 


The photon noise is simply defined by the square root of the total number of photo- 
electrons in the integration range, such that we obtain 


Na, = | I(A)d\ = 21g 0) — Sew = Ip +0) + (2 —), (5) 


—oy 


where Szw = ean Ip — I(A)dA is a quantity related to the equivalent line width, 
EW, and can be approximated by c- Ip - a) for a Gaussian-like line shape. 


4.2. Dark Noise 


The dark current produces a dark noise component that is simply the square root 
of the total number of thermal electrons produced (during the integration time) in 
the detector pixels covering the wavelength integration range. In other words, we 
obtain for the variance of the dark noise contribution 


N?2,=2-ne- na: Ih +t, (6) 


where t is the exposure time, ng and n- are the pixel sampling in dispersion and 
cross-dispersion direction, respectively, and Jj, is the dark current per individual 
pixel. The factor of 2 originates from the fact that ng is defined as the number of 
pixels per spectral bin, i.e. the FWHM of the Gaussian spectral line, thus being 
equal to oy, i.e. half the integration range. 
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4.3. Read-Out Noise 


Similarly to the dark noise, we can compute the total read-out noise by quadratically 
adding up the read-out noise per pixel, RON, over all the pixels in the integration 
range. However, the total read-out noise is reduced by the binning factors bg and 6, 
in the dispersion and cross-dispersion directions, respectively, if binning is adopted. 
The variance of the read-out noise then becomes 


Ne * Td 


-RON?. (7) 
(one ba 


Again, the factor of 2 originates from the fact that nq is defined as the number of 
pixels per spectral bin, i.e. half the integration range. 


4.4. Background Noise 


Finally, we also have to take into account possible background noise that is produced 
by sky or instrument background, Ig. This contribution behaves in a similar way 
as the photon noise, with the difference, however, that it does not scale with the 
magnitude of the observed astronomical source and that it is assumed to be a con- 
tinuous emission Ig (no spectral features in the range of integration) for simplicity. 
Then, we obtain 


N32 =| Ip(A)dd = 21g - oy. (8) 


4.5. Total Noise 


The total variance of the line-barycenter measurement expressed in photo-electrons 
is the quadratic sum of all components mentioned above: 


Ng = N32, + Np + NRon + NB 


= Igoy(2—c) + 2nenaljt + a 2 RON ote. (9) 
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At this stage, all the signals J9 and Ig are expressed in photoelectrons per 
wavelength unit. Without loss of generality, they are easily converted to Ij and Ip, 
denoting photoelectrons per resolution element (= extracted pixel). In this case, the 
value na/(7 Io) becomes 1/J), since we had selected ng to be the number of pixels 
per FWHM o) of the line. After this substitution, the variance of the measurement 
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where o, now represents the FWHM of the extracted line in pixels, and oj, and 
ORzx are the intrinsic line width and the spectral resolution, respectively. 

It is easily recognized that the expression in the rectangular bracket becomes 
1 >(1 —c/2) > 0.5 when the measurement is photon-noise limited, i.e. when 


It : N2 ff 
N? = 2100: |(1-$) +n B ee 2 


n 
Ij ne: Th-t and Ij> ae -RON? and Ij > I6. 
ec CR 
From these conditions, it is possible to deduce a “limiting magnitude” starting 
from which the measurement precision will be limited by the choice of the spatial 
sampling, nc. 


4.6. Error on the Radial- Velocity Measurement 


In order to determine the radial-velocity error ery = a: €) = B+ €z (a and 3 
being scale conversion factors), we shall start from Eq. (3) and solve it for the 
error. For the sake of simplicity and generality, we remain in pixel space, which is 
easily converted into radial-velocity once the factor 3 is determined for the specific 
spectrograph. Thus, we obtain 


5 5 3/2 
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(12) 


where we have used the relation for the equivalent width in pixels EW, =c- oz, a 
value that does not depend on the resolving power of the spectrograph. 
From the last formula several conclusions can be drawn: 


(1) In the photon-limited regime, the formula reduces to 


noe (1-3); 08) 


Eg = 


which does not depend on instrumental parameters apart from the spectral 
resolution or resolving power. Above the limiting magnitude, the stellar flux 
has to be compared with the total dark current and the total read-out noise in 
an integration element 2c,. 
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At a fixed instrumental configuration and for a given flux, Jj, the error decreases 
with increasing spectral resolution. In the limit of an unresolved spectral line, 
the error actually decreases as the three-halves power of the resolving power, R, 
since R ~ 1/opRz. For a given science case (spectral type, stellar rotation veloc- 
ity, magnitude), a trade-off analysis must be performed to determine whether 
slit efficiency or resolving power must be preferred to obtain more precise mea- 
surements (in terms of “photon noise” ). 

The results are summarized in Fig. 5. The right-hand side of the figure shows 
how fast the RV error decreases with resolving power, as long as the spectral 
line is not resolved. In this regime, the exposure time needed to obtain a given 
RV precision decreases with the cube of the resolving power! It is important to 
note here that the curves for slowly rotating solar- or later-type stars only start 
to flatten out at resolving power above R = 100,000. 

For a fixed spectral resolution (and slit width), Eq. (12) explicitly depends on 
Ne, the number of spatial pixels. This fact is not without consequences when 
choosing an image or pupil slicer to increase resolving power for a given instru- 
ment design (and size). In fact, since the etendue is conserved, a slicer will result 
in a higher number of spatial pixels, with the consequences described above for 
the limiting magnitude. 


We have so far only analyzed the precision obtained for a single spectral line. In 


reality, the radial velocity is computed over the whole spectral range of the spectro- 
graph, involving thousands of lines of different width and contrast. Independently 
of the method used to compute the stellar radial velocity (cross-correlation, auto 
correlation, spectral modelling, line-by-line fitting, etc.), the final value will always 


be 


a photon-weighted average of the radial velocity of each single line. If done 


correctly, the main difference between the various methods will be their sensitivity 
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Fig. 5. Radial-velocity error egy (exposure time fixed) and exposure time (radial-velocity error 
ey fixed) as a function of resolving power of the spectrograph for various intrinsic line widths of 


the 


stellar spectrum. The plots are shown in arbitrary units since the absolute scale will depend 


on telescope size and instrumental throughput. 
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to systematic errors, which are not described in this chapter. However, the radial- 
velocity precision will depend on the spectral energy distribution, the number of 
lines, and their average line width and contrast, i.e. the spectral type and the pro- 
jected stellar rotation velocity. The discussion of these effects is beyond the scope 
of this chapter. We refer therefore to Ref. 14 for a detailed discussion. 

Many future projects for radial-velocity spectrographs*! aim at detecting rocky 
planets in the habitable zone (HZ)®? of a solar-like or low-mass star (a distance 
to the star at which liquid water can persist on the surface of the planet). In 
order to attain this objective, they must be photon-efficient and precise to the 
sub-meter-per-second level. Photon efficiency is obtained with optimized designs 
and high spectral resolution. 


5. “Secrets” of the Precise Doppler Spectrograph 


5.1. Conservation of Etendue 


Resolving power is defined as R = \/AX, where AA is the FWHM of a nonresolved 
spectral line. The linear or angular width of A. is defined by the finite image size of 
the entrance slit on the detector. In most of the astronomical echelle spectrographs, 
the image of the slit is much larger than the diffraction limit. Therefore, the wave- 
length bin covered by an unresolved spectral line is AX = s/D, where s is the angular 
slit size seen by the dispersive element and D = 03/0. its angular dispersion. In 
Littrow and at blaze, the angular dispersion of an echelle grating is D = 2-tan 3/A 
and is therefore exactly inversely proportional to \. As a result, for all blazed wave- 
lengths across the spectral range the resolving power is R = \/AA = 2-tan @/s and 
is thus perfectly constant with wavelength. 

Using the conservation of etendue, A x 2 (the beam cross-section area times 
the solid angle), and geometrical optics, it can be shown that 


eerie = const, (14) 
where FOV is the field-of-view of the slit/fiber, Dy. the telescope’s primary mirror 
diameter, Deo the diameter of the collimated beam on the echelle grating, and 
finally @ the blaze angle of the echelle grating when operated in Littrow condition. 
For seeing-limited instruments, the optical etendue increases with the telescope size, 
and so does the instrument size if the spectral resolution is kept fixed.! In the era of 
8-m class and extremely large telescopes (ELTs), this aspect has become a technical 
and managerial challenge. Reducing the FOV will help in controlling Deon, but only 
at the expenses of slit efficiency. The blaze angle of the echelle grating should be 
made as large as possible for highest dispersion, but practical limitations such as 
manufacturability and optical efficiency will set the optimum to about R4 gratings 
(tan G = 4, 3 = 76°). 
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A way to overcome the problem of instrumental size, represented by the size of 
the collimated beam Doon, is to adopt image® or pupil slicing.®* In both cases, the 
goal is to reduce the size of the input slit/fiber in the main dispersion direction in 
order to reduce the size of the spectral bin represented by this image on the detector. 
This increases the resolving power, assuming constant angular dispersion. This goal 
could also be achieved simply by “masking” the slit, but this reduces throughput. 
Instead, image or pupil slicing reduces the size of the slit in the main-dispersion 
direction and rearranges the multiple slit/fiber images along the cross-dispersion 
direction, again to respect conservation of etendue. This trick will, however, result 
in an increase of used detector pixels, of which the number per spectral element will 
grow with the square of the number of slices N,, compared to an unsliced solution 
with identical resolving power. 

Let us suppose a spectrograph of spectral range A, fixed resolving power R 
and spectral sampling s in dispersion direction, where s is the number of spec- 
tral pixels per FWHM of a nonresolved line (called spectral element hereafter). 
Let’s also suppose that no anamorphosis of the light beam is performed inside the 
spectrograph. The total number of used detector pixels, Npix, will in this case be 
approximately: 


A A 
Npix © DA 2? = oR. s°N?2. (15) 
R 


Equation (15) tells us that the number of pixels necessary to sample the spectrum 
with a certain quality depends linearly on the resolving power and the bandwidth. 
Sampling the spectral elements with sufficient pixels is important but increases 
the number of used pixels by s?. Finally, reducing the size of the spectrograph by 
implementing slicing is indeed effective, but introduces additional detector costs, 
which increase with the square of the number of slices or, by considering Eq. (14), 
with the square of the factor by which the collimated-beam diameter can be reduced 
by slicing, while keeping the same resolving power. 

Equations (14) and (15) govern the size and the costs of a spectrograph. They 
can and should be used to make trade-offs between the various design options, 
provided, however, that the merit function is well defined. For instance, aiming at 
high throughput will lead us to the choice of a large slit for smaller slit losses; 
however, this is likely to be done at the expense of resolving power. As seen in 
the previous section, resolving power is an important contributor to radial-velocity 
precision. The trade-off should therefore consider the fundamental error on the 
observable, not only the SNR. 

One of the main reasons for building larger telescopes is the observation of faint 
targets. For photon-noise limited observations, however, if the product of collecting 
area and integration time is constant, this leads to identical SNR and measurement 
errors. Therefore, there is only a gain in using larger telescopes if the detector- 
noise limit can be pushed toward larger magnitudes. Let’s thus suppose that we are 
interested in putting a spectrograph on a telescope of twice the size, while keeping 
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resolving power, spectral sampling and wavelength range unchanged. Let’s further 
compare two cases: 


(1) The spectrograph size scales with the telescope size, i.e. the collimated beam 
diameter is multiplied by a factor of two. In this case, the spectral format and 
the total number of pixels used per spectral element remain unchanged. As a 
consequence, carrying out four observations with the smaller telescope, which 
gives the same product of collecting area and integration time, will result in 
using four times more pixels than in the case of a single exposure of same 
exposure time but with the larger telescope. The read-out-noise and the dark- 
noise limit will therefore be shifted by about 1.5 magnitudes towards fainter 
objects. 

(2) The spectrograph size is kept unchanged, and, in order to achieve the same 
resolving power, slicing is introduced. Equation (15) tells us that in this case the 
number of pixels per spectral element is increased by a factor of four. Observing 
with a telescope twice as large but performing one instead of four exposures, 
will result in the use of exactly the same number of pixels. Read-out-noise and 
the dark-noise limit will therefore remain unchanged. 


In summary, slicing is an adequate solution to control the size of the instrument 
and thus its costs. However, this translates to an increased detector cost, which, 
in the case of infrared detector, may be a nonnegligible fraction of the total costs. 
Furthermore, slicing will a@ priori add detector noise. While read-out noise can 
be controlled by on-chip pixel binning, at least to some extent, the dark noise 
component cannot be avoided. Science cases that move into the detector noise regime 
must consider this aspect when driving the design of new instruments. 


5.2. Optical Design Choices 
5.2.1. General Considerations 


Every science case, every telescope, and every instrument deserves a specific study, 
trade-off analysis and optimization. In the previous section, I have given “recom- 
mendations” when looking for highest RV-precision. They may apply in some cases, 
but not in all of them. Nevertheless, I would like to make a few other (more philo- 
sophical) considerations: 


(1) When designing an instrument, one should aim at the simplest solution that 
solves “the problem”. “Simple” means that no complexity, no other function, no 
other elements must be added that are not strictly necessary to solve the prob- 
lem. Of course, the “simplest solution” may only be defined once “the problem” 
is well defined. It is the Instrument Scientist’s task to define the smallest and 
most pertinent set of requirements, and to monitor the simplest solution, which 
complies with these (and only these) requirements! In other words: “There is 
no reason to do something just because it is feasible!” 
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Precision Radial-Velocity Spectrographs (PRVS) look for very tiny effects that 
can hardly be modeled or measured. A step-wise approach, building on good 
experience, is the most promising. One of the most important experiences 
acquired during the past years is that an intrinsically stable spectrograph is 
more likely to provide repeatable measurements. The reason is that zeroth- and 
first-order measurement errors can be removed at the root, such that one can 
actually identify residual effects and possibly remove them as well. 

High resolving power brings several advantages, not least in terms of RV preci- 
sion. It shouldn’t be forgotten, however, that we are looking at effects that are 
tiny compared to the width of the spectral lines we are using to compute the 
Doppler velocity. Moreover, the spectral lines are sampled by only a few pixels. 
Increasing the dispersion or the resolving power will increase the sampling and 
consequently reduce the impact of pixel defects or response variation, instru- 
mental drifts, IP distortion, etc. on the measured radial velocity. These effects 
scale with the inverse of the size (in wavelength) of the resolution element. If we 
consider a spectral and spatial sampling of four pixels in each direction, and the 
use of about 1000 stellar absorption lines for the RV computation, then we can 
consider that random pixel defects will be reduced by a factor V/4-4-1000 = 
127. At a resolving power of R = 100,000, each pixel covers about 750ms~!. If 
we assume that the photo-response center of a pixel could be affected by about 
1/10th of its size, then we will end up with a statistical error on the stellar 
radial velocity induced by pixel-geometry errors of about ery © 75/127ms—! & 
0.5ms~!. This value can be used as a reference for HARPS-like spectrographs.®°! 
In the case of ESPRESSO, resolving power (factor 1.4), spectral sampling 
(factor 1.5) and spatial sampling (factor 4) are much higher, and the newer 
detectors have improved pixel geometry by about a factor of 10! For this rea- 
son, we expect that even without calibration of the pixel geometry an overall 
RV repeatability of 0.1m s~! can be achieved when adequate choices are made. 
On the other hand, high sampling will considerably increase the detector costs, 
read-out noise and dark current. Therefore, a “compromise” must be found. By 
Fourier analysis, it can be shown that with a sampling of three spectral pixels 
per FWHM of an unresolved line one will recover more than 80% of the informa- 
tion content. Depending on the required measurement repeatability, either spec- 
tral or spatial sampling must be increased until a balance is found with detector 
noise. Alternatively, tunable laser-frequency combs can be used to characterize 
every pixel and its response in details, in order to avoid increasing the sampling. 
Finally, it should be noted that excellent image quality of the spectrograph will 
lead to a symmetric and “clean” IP. In particular, good image quality, ideally 
diffraction-limited, will contribute in making the spectrograph insensitive to 
illumination changes. An excellent optical design remains therefore the best 
starting condition for a high-precision radial-velocity spectrograph. 
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In the following, we will give a nonexhaustive list of guidelines for the optical 
design of high-precision radial-velocity spectrographs. Their implementation is not 
necessarily mandatory and should be considered only as advisable. Their contribu- 
tion to the radial-velocity error budget can hardly be estimated individually, since 
the effects are not independent from other factors in the design. As an example, 
let’s just recall the data-reduction software, which may or may not correct for some 
of those causes if treated correctly. 


5.2.2. Atmospheric Dispersion Compensator 


The Atmospheric Dispersion Compensator (ADC)° is an important component, 
in particular when considering wide spectral ranges. It will avoid chromatic slit- 
efficiency losses. It is important to introduce it on the optical path before the beams 
for the spectrograph and the guiding camera are separated, so that the guiding cam- 
era will see the same image that is injected into the spectrograph. Better centering 
at all colors will be the result. 


5.2.3. Slit/Fiber Viewer, Guiding and Tip-—Tilt 


Because of slit efficiency and illumination stability, it is critical to perfectly center 
the stellar image on the slit or the fiber. The use of a slit or fiber viewer is therefore 
strongly preferred compared to the use of a beam-splitter or dichroic. By experience, 
guiding on a 14th magnitude on a 1-m class telescope, 17th magnitude on a 4-m 
class telescope, or 20th magnitude on a 8-m class telescope can be achieved under 
nominal astro-climatic conditions. 

In order to avoid slit losses and centering errors, it is strongly recommended 
to make sure that the guiding camera observes in the same spectral band as the 
spectrograph. A tip-—tilt system is not mandatory but certainly useful in case of old 
telescopes or insufficient tracking. On the other hand, the tip—tilt will not signifi- 
cantly contribute to efficiency, since on 4-m class telescopes or larger the seeing is 
highly dominated by high-order turbulences. 


5.2.4. Light Injection and Fiber Feed 


Despite some additional losses (depending however on spectrograph location and 
wavelength), the use of optical fibers®*®° has great advantages over slit spectro- 
graphs. First, they allow us to place the spectrograph in a gravity-invariant, ther- 
mally stable location, which will in turn significantly improve the spectrograph’s 
stability. Second, the optical fibers will critically contribute to the image scram- 
bling®’ © (illumination stability). Third, and given the two previous points, the 
stabilized spectrograph allows us to avoid the use of an absorption cell. 


“See Chapter 5 of Volume 2 for more information on ADCs. 
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5.2.5. Scrambling 


A stable illumination of the spectrograph for both near (image-plane) and far 
(pupil-plane) fields is of fundamental importance for precision radial-velocity 
spectrographs. The illumination stability ensures that the photocenter of a spectral 
line imaged on the detector does not change as a function of input illumination, i.e. 
it is independent of the position of the stellar image on the fiber tip or the angular 
light distribution received from the telescope. Standard (step-index, circular) fibers 
do a good job, at least in terms of near-field scrambling.” © The scrambling gain, 
usually measured as the reduction factor gs = Sout/Sin of the residual near-field 
photocenter shift sou; at the output of the fiber caused by a possible shift si, of the 
input near field, is however no larger than about 200. Noncircular fibers (e.g. with 


70-73) have a much better scrambling 


octagonal, hexagonal or square cross-section 
factor, say 1000 or higher, and are therefore being used in a standard way in recent 
PRVS.? 

Even noncircular fibers, however, show weak far-field scrambling. Variation of 
the far-field can induce quite significant IP changes in the spectrograph. A complete 
image scrambling by the fiber becomes only effective if both near- and far-field are 
well scrambled (Fig. 6). This is achieved by using two sections of optical fibers in 
series, while exchanging near- and far-field in between them. Effective improvements 


have already been demonstrated on operational instruments.’ “? There are various 
1,67, 74-84 
ei 


implementations of the so-called double scramble all with the same goal, 


Lens Lens 
Pupil -> Image Image -> Pupil 


Optical fiber |: Optical fiber 


Double scrambler 


Fig. 6. Principles of the double scrambler. The upper part of the figure shows the tip of various 
types of noncircular fibers illuminated from the back. The homogeneity of the illumination demon- 
strates the high level of near-field scrambling. When combined into a double scrambler (lower part 
of the figure), that exchanges near- and far-field in a simple symmetric mount, these fibers produce 
a very homogeneous and stable illumination of the spectrograph both in near- and far-field. 


See Chapter 6 of Volume 2 for further discussion of structured fibers and related topics. 
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namely to act like an integrating sphere without increasing the beam etendue. Any 
other form of scrambling, e.g. by bending the fiber, will introduce significant focal- 
ration degradation (FRD) and thus efficiency losses. 

It must be noted, however, that the described image scrambling is only effective 
in the limit of geometrical optics, i.e. when the number of modes of the fiber — a 
wave guide — is large, or when the wavelength is small compared to the physical 
dimensions of the fiber. Distributing the energy across the modes by exchanging the 
geometrical near- and far-field becomes less effective as soon as the number of modes 
is of the order of 100 or below. This situation may arise on small telescopes or when 
slicing the beam, and will become more relevant at longer (infrared) wavelengths. 
Unless achieving a mono-mode solution (e.g. by using adaptive optics while accept- 
ing heavy efficiency losses), a two-fold strategy must be adopted to encompass this 
difficulty: 


(1) Since scrambling is less efficient, it is advisable to make sure that the source 
“fills” the etendue of the spectrograph in the most uniform way. It must be 
noted that some wavelength-calibration sources, for instance the laser-frequency 
comb, are spatially coherent and mono-mode, and would basically populate 
only one mode on the multi-mode fiber of the spectrograph. The resulting IP 
of the spectrograph would be quite different from that of the stellar spectrum, 
since the stellar image is naturally scrambled by atmospheric turbulence and 
populates all the fiber modes. In case of excellent seeing, a possible tip-tilt 
system used for guiding could be used to actively illuminate the fiber tip in a 
homogeneous way. In the case of the calibration sources, however, spatial and 
temporal scrambling devices must be utilized to populate all the modes in a 
statistically homogeneous way. 

(2) However, even if all the fiber modes within the spectrograph’s etendue are 
populated homogeneously, their number remains small. If they evolve with time, 
they will produce model noise that reflects itself in reduced SNR and in IP 
changes along the spectral direction of the echelle orders. Both, stellar and 
calibration spectra, will be affected, but depending on the timescale of the mode 
evolution, the impact will be different. It is therefore recommended that the fiber 
modes are forced to evolve at timescales faster than the shortest exposure, such 
that effects will be averaged out. This modal scrambling must be done without 
photon losses, however. Various techniques like bending, stretching, twisting 
and heating of the fiber have been investigated, and, although the last word 
has not been spoken, the stretching technique appears as the most promising 
to date. 


5.2.6. Slicing 


If slicing is necessary, it should be ideally performed before the light is scrambled, i.e. 
before it enters into the spectrograph fibers. A lenslet array feeding a fiber bundle 
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can be used for this purpose. In this case, each fiber feeding the spectrograph should 
be scrambled individually, and the spectra produced by each fiber must be extracted 
separately, since their relative fluxes are subject to variations. If slicing is performed 
after the spectrograph fiber, pupil slicing should be preferred over image slicing. 
There are two reasons for this recommendation: (a) The image slicer acts in the 
image plane. Any optical effect caused by possible thermo-mechanical instabilities 
of the slicer or the preceding relay optics will affect one-to-one the photocenter of 
a spectral absorption line on the CCD. If we consider that 10cm s~! corresponds 
to nanometer-level motions in the image slicer plane, then we understand that 
very stringent stability requirements would have to be set to the opto-mechanics. 
(b) The image slicer can act, by definition, only on an individual (fiber) image. If a 
simultaneous reference fiber is used, possible image motions suffered by the target 
fiber won’t be seen by the reference fiber. The pupil slicer, instead, acts where the 
beam is large, typically of several centimeters in diameter, and where the beams 
of the reference and the target fibers are perfectly super-imposed. Consequently, 
thermo-mechanical effects will have close to nonexistent impact on the spectral 
line, but even if they are present, the effect is likely to be identical for both the 
target and the simultaneous reference. 


5.2.7. Echelle Grating in Littrow 


Echelle spectrographs used in (quasi-) Littrow mount have exceptional charac- 
teristics: 


(1) Since used at high incidence angle, the angular dispersion D = 2- tan 3/X is 
very high. Furthermore, D does not depend on the groove density and thus the 
diffraction order m. 

(2) The echelle grating produces a number of wavelength chunks. Being reflective, 
the echelle grating can cover a large wavelength range. By choosing the groove 
density (or order m), the free spectral range, FSR, and actual lengths of each 
chunk can be customized. 

(3) As shown above, resolving power does not depend on wavelength and is thus 
perfectly constant along the entire spectral range of the spectrograph, at least 
for all the blaze wavelengths in the center of the echelle order. 

(4) It can be shown that both the angular dispersion and the anamorphic effect on 
the slit /fiber image evolve with cos 3, where (3 is the output angle of the beam 
on the echelle grating measured from the normal to the grating. For this reason, 
even along one single order, the resolving power remains constant within a few 
percent. 


All these aspects allow the cross-dispersed echelle spectrograph to cover a wide 
spectral range with high and uniform resolving power. Furthermore, by choosing 
the optimum groove density, the spectral format can be adapted to the available 
detectors to make an almost optimum use of the available pixels (cost optimization). 
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5.2.8. White Pupil Mount 


The “white pupil”’® mount was first proposed by Baranne (e.g. Ref. 85) and later 
implemented in most of the modern echelle spectrographs. By reimaging the echelle 
grating via a “white” focal plane and a second collimator, it is possible to control 
the beam size after dispersion by the echelle grating (see also Fig. 1). If a collimator 
of identical focal length is used (e.g. a parabola in triple pass like in HARPS), the 
size of the white pupil will be identical to the projected size of the echelle grating. 
Alternatively, the focal length of the second parabola can be made shorter in order 
to reduce the beam size and the white pupil. In fact, the cross-disperser will be 
located close to the white pupil and the camera lens will follow immediately after 
in order to make both as small as possible. 

The white-pupil mount has however a second advantage: since the beam diver- 
gence is under control, the risk for stray light is very low. Even using nominal sizes 
for the optical elements, all the beams will fall within their clear aperture. The 
only mandatory baffle in the entire spectrograph must be located in front of the 
collimator recollecting the light from the echelle to avoid stray light from the very 
edge (large off-axis angle) of the diffraction orders. 


5.2.9. Cross-dispersion 


As mentioned above, the cross-disperser is ideally located close to the “white pupil” 
(see previous section for the definition of the “white pupil”) in order to minimize its 
size, no matter what type of cross-disperser is used. All dispersers, prisms, grisms, 
volume-phase holographic gratings (VPHs), or reflective diffraction gratings, have 
their own advantages and disadvantages. In terms of efficiency, prisms are unsur- 
passed. However, their dispersion power is limited and their size might become sig- 
nificant when employed in very high-resolution spectrographs. VPHs have become 
more commonly employed during the past years thanks to the improved efficiency 
(up to 90% at blaze), but their usage, in analogy to any diffraction grating, is lim- 
ited to one free-spectral range (FSR, one octave in first order). Since spectrographs 
for large telescopes will anyway need more detector space, separation into several 
“colored” arms becomes necessary and the use of diffraction gratings more optimal. 

In all cases it must be verified that variations in temperature will not provoke 
a significant motion of the spectrum in the cross-dispersion direction, since most 


©The “white pupil” describes an optical concept that aims at keeping the (dispersed and thus 
diverging) collimated beam diameter of the spectrograph under control. The echelle grating is 
usually placed at the location of the spectrograph’s pupil in the collimated beam. When the beam 
falls on the echelle grating, it is still undispersed and therefore “white”. In the “white pupil” mount, 
the echelle grating is reimaged by mirrors or lenses to form an image of the white pupil with a 
magnification equal or smaller than one. At the location of the white-pupil image, it will then be 
possible to insert a cross-disperser and a camera lens that will have a reasonable size. Without 
the “white-pupil” mount, the parallel beam would rapidly diverge after the echelle grating due to 
dispersion, requiring the cross-disperser and the camera lens to have a clear diameter much larger 
than the echelle grating. 
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of the data reduction pipelines use order localization techniques from spectral flats 
taken several hours before the scientific exposure. Any shift of the orders won’t 
be tracked and may result in extraction errors. When using transmittive solutions, 
materials shall be preferred that have low coefficients of thermal expansion and low 
refractive-index variations. Ideally, a combination should be preferred where the 
two effects compensate each other. Eventually, the thermal stability requirements 
must be dimensioned accordingly. 

It should be mentioned that both prisms and gratings produce uneven separa- 
tion of the orders across the detector, resulting in a nonoptimal use of the detector 
space. In fact, the FSR of a single order increases toward the red part of the spectrum 
(lower orders). Since the angular dispersion of a diffraction grating used in first order 
is approximately constant in wavelength, the order separation will increase toward 
the red. Conversely, the dispersion of a prism is approximatively proportional to 
1/\?, and the order separation will be larger on the blue side of the spectrum. 
Since a minimum separation of the orders is required to avoid cross-contamination, 
especially in the presence of a simultaneous reference spectrum, the use of a prism 
or a grating only will result in significant waste of detector space. A very elegant but 
technically more demanding solution is the combined use of a prism and a grating 
(e.g. grism or VPH), as proposed for the ELODIE spectrograph.”° If half of the 
needed dispersion is produced by each element, the resulting dispersion will evolve 
proportionally to 1/, making the orders perfectly equidistant. Such a solution will 
allow the spectrum to be compressed significantly in the cross-dispersion direction 
and will reduce the needed detector space (or increase the covered wavelength range) 
by a factor from 1.5 to 2. 


5.2.10. Detectors 


Much can be said about detectors but a complete description goes beyond the 
purpose of our generic discussion and would deserve a dedicated chapter or even 
a book.! From our discussion regarding fundamental precision limits it becomes 
evident that one should seek for detectors with high quantum efficiency (QE), low 
read-out noise and low dark current. 

One aspect not to be neglected is the detector cosmetics. Besides the fact that 
good cosmetics reduces losses and significantly eases data reduction, it is also a sign 
of good quality, in particular with respect to charge-transfer inefficiency (CTI). 
Often, high CTI is correlated with (or even generated by) defects. Even worse, 
CTI can vary with flux, increasing for instance at low flux level and producing 
line shapes that depend on flux level. In terms of radial-velocity precision, this 
might have dramatic effects, which can be corrected only to some extent by data 
reduction software. Since CTI can hardly be measured as a function of flux at the 
required precision (and the corresponding information is actually never delivered 


fSee, for example, Chapter 9 of this Volume and Chapter 1 of Volume 2 of this Handbook. 
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by the CCD manufacturer), it is recommended to procure CCDs with best possible 
cosmetics and charge-transfer efficiency (CTE), possibly better than 0.999999 (six 
nines). 

It can be debated whether the CCD columns (parallel transfer direction) should 
be placed parallel to the echelle orders or perpendicular to them. Since “blooming” 
occurs along the columns, one would intuitively argue that the echelle orders should 
be perpendicular, in order to avoid “damaging” large portions of the spectrum by 
local saturation. However, this argument does not hold when observing absorption 
spectra, for which saturation should never occur. Furthermore, it is questionable 
whether it should be preferred that possible blooming contaminates one single 
(larger) chunk of a spectrum or various portions of spectra over many orders. More 
importantly, it should be considered which direction presents the better CTE. Since 
no readout is involved in the parallel transfer, it is often better than the serial direc- 
tion. Placing the orders parallel to the columns has another advantage: the read-out 
of each pixel along the spectrum is completely decorrelated from the preceding and 
the following pixel, since in between their read-out a full line is read. Therefore, 
neighboring spectral channels along the echelle order are perfectly independent and 
decorrelated. 

The most difficult problems to deal with, in any kind of detector, are nonlin- 
earities and temporal changes of the amplification gain and offset. CCDs have an 
advantage over CMOS, since at least all the pixels of one read-out port are supposed 
to be exposed to the same electronic gain and bias level. Also, Photo-Response Non- 
Uniformity (PRNU) is perfectly measured and corrected for, provided that it does 
not evolve too quickly with time. It is impossible to give more precise guidelines and 
recommendations, since often there is not much choice in technology, manufacturer, 
models, and finally, in buyable products. Most of the time the situation is just as it 
is. Nevertheless, if a choice can be made, not all the parameters should be sacrificed 
for the sake of quantum efficiency. When talking about radial velocities, cosmetics, 
CTE and stability do often pay off much more than a, say, 10% improvement in 
quantum efficiency. 


5.3. Stable Environment and Stable Opto-Mechanics 


As described in previous chapters, a stable instrument is not a guarantee for a 
repeatable and accurate RV measurement. Nevertheless, a stable instrument will 
dramatically help in avoiding low-order effects that will impact the radial-velocity 
if not properly corrected. For this reason, it is important to make the instrument as 
stable as possible, while of course adopting reasonable and cost-effective solutions. 
Whenever possible, the following choices must be adopted: 


e The spectrograph must be installed in a gravity-invariant location. 

e The spectrograph should be operated in vacuum (the simplest form of extremely 
constant air-density and air index). 

e The spectrograph and all its subsystems must be controlled in temperature. 


232 F. A. Pepe 


These are three measures that are feasible and very reasonable in terms of costs 
if provisioned in the early design phases. In addition to these provisions, engineers 
should be advised to follow five simple “commandments” of a stable opto-mechanical 
design when conceiving a PRVS: 


(1) The instrument must be installed such that the grooves of the echelle grating 
are parallel to the gravity vector. This solution will allow best symmetry and 
stability in the spectral (radial-velocity) direction. 

(2) The opto-mechanical setup must be symmetric with respect to a vertical plane 
in all its elements: bench, optics, supports, fixations adjustments, etc. The only 
element that cannot be mounted fully symmetrically is the echelle grating. Also, 
the cooling system and the cold plate must be symmetric with respect to the 
same vertical plane. 

(3) The optics must be referenced by a direct metal-glass contact on the front 
surface of the optical element. The optical elements must all be manufactured 
with mechanical references defining the optical axis and their orientation. 

(4) Alignment within the optical mounts and of the optical mounts on the bench 
must be done by shimming or by machining only. Adjustment screws, spring- 
systems or moving parts must not be used, apart from pre-loading purposes 
(push optical components against their mechanical references). “Everything 
that can be aligned, can be misaligned.” 

(5) The optical mounts must have only degrees of freedom that are required by the 
alignment procedure, which is in turn defined by the tolerance analysis. 


6. Data Reduction and End-to-End Modelling 


Although the discussion about data-reduction techniques and software is out of 
the scope of this chapter, I should not miss mentioning its importance and its 
deep entanglement with the spectrograph and the measurement. At the required 
level of precision, data reduction should actually be considered as a part of the 
instrument rather than a separate data post-processing. The implications of this 
approach extend up to the project definition and systems engineering, since there 
is no possible RV error budget without considering the data reduction processes. It 
is strongly recommended to start the development of the data reduction software 
as early as the instrument design, to involve instrument and data scientists right 
from the beginning, and to “close the loop” (continuous cross-verification) with the 
top-level requirements, in the same way as for the hardware part of the instrument 
and through all phases of the project. 

Once the data reduction software is available, the “temptation” of verifying 
it on simulated data is huge. In fact, almost all the projects foresee, in their com- 
mencement, the development of an end-to-end model. Although this goal is certainly 
highly commendable, and sometimes even requested by review committees, several 
cautionary considerations should be made: (1) The effort of building a valuable 
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end-to-end model turns out to be significant and the related costs huge. If the objec- 
tives are not clearly defined and managed as an independent project, this effort will 
turn out to be of no use. (2) Although it may be sound trivial, it should be recalled 
that only effects that have been modeled can be investigated and analyzed. In order 
to model an effect, it has to be known and understood, and this alone will require a 
lot of effort. Only after this step will the end-to-end model be able to deliver infor- 
mation about the magnitude of the effect (i.e. its relative contribution to the error 
budget). It is my personal opinion that the analytical (qualitative) understanding 
of the problem ultimately has a much higher value than its numerical (quantitative) 
understanding. Nevertheless, the former does not exclude the latter, and if resources 
are available, both can be carried out. (3) Because of the two preceding points, one 
should consider testing the data-reduction software on real rather than on simulated 
data. When data of the target instrument are not (yet) available, raw data from 
similar, existing instruments can be used. (4) Finally, I would like to provide one 
piece of very personal advice to all people developing software (and instruments): 
Do not try to solve yet unknown problems! Aim at finding the simplest solution 
that solve a specific (and known) set of problems, rather than foreseeing a system 
that can deal with everything. 


7. Open Issues and Future Strategies 


Doppler measurements have not stopped evolving during the past decades. Still 
today, and on the best spectrographs in operation, the technique and the meth- 
ods are continuously improving. The goal of performing measurements of precision 
higher than ever done before cannot have a linear and programmable path. Never- 
theless, we can and must adopt clear strategies that may lead to success. 

The first step is to recognize that radial velocities are extremely valuable. The 
need for improving the radial-velocity precision has been questioned in the past by 
the argument that the measurement precision is limited by stellar effects. At the 
beginning of the 21st century, many specialists believed that the limit of 3m s~! 
would not be surpassed because of stellar jitter. The HARPS instrument not only 
demonstrated the opposite, but led with its m s~! precision to the massive discovery 
of the previously unknown category of super-Earths, down to planets with Earth- 
mass. 

During the past years many other instruments have been built or are being 
developed that aim at ms~! precision, but again it is being questioned whether 10 
cm s~' instrumental repeatability and precision, such as aimed by ESPRESSO, is 
needed and actually useful. As in the case of HARPS, only the future will tell us 
the answer. From today’s perspective, we can state for sure that we will not get the 
answer if we do not make the attempt. There is, however, a more objective argument 


‘ 


to aim at higher precision: until recently, stellar jitter was treated as “noise” for 
the simple reason that it was poorly understood and unresolved. Several attempts 


are today being made to actually model and understand stellar effects, or, in other 
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words, to treat them as signals rather than noise. In order to do so, this signal 
must be measured with the same precision at which it is modeled and corrected for. 
For radial-velocity measurement at 10 cm s~', one therefore needs to measure the 
radial velocity and the stellar signals at the same level, requiring in turn at least 
equivalent “photonic” and instrumental precision. 

Furthermore, we should not forget that precise radial-velocity spectrographs 
provide a new parameter space of knowledge when combined with other obser- 
vational techniques like photometry and transit observations, transit spectroscopy 
or high spatial resolution imaging. Radial-velocity follow-up of transit missions like 
CoRot, KEPLER or TESS has already demonstrated its potential, e.g. by delivering 
precise mass and density measurements of low-mass planets, or by enabling transit 
spectroscopy of transiting candidates. 

Finally, I should recall that PRVS are not only for radial-velocity and exoplanet 
search and characterization. Their exquisite precision and spectral fidelity allow 
them to contribute to different areas of astrophysics, such as stellar physics, the 
investigation of the possible variability of physical constants, the direct measurement 
of the acceleration of the Universe, and many more. As a single and of course 
nonexhaustive example of the new domains of research enabled by high-fidelity 
spectrographs, I refer to the Phase A study of ELT’s high-resolution spectrograph 
HIReS.*6 

Of course, cm s~! precision measurements have not yet been obtained, and 
new instrumental issues must be addressed. Since we are continuously entering 
unexplored territory, no detailed and specific solution can be given at this stage. It 
is advisable, however, to build on existing experience and proceed by small steps, 
solving one issue after the other, the most limiting first. A nonexhaustive list of 
open issues is given hereafter: 


e Ilumination and modal noise: While for visible spectrographs the illumination 
stability has been improved by the noncircular fibers and the double scrambler 
to a satisfactory level, problems still exist with the light injection of coherent 
calibration sources. It must be ensured that the full etendue of the spectrograph 
is illuminated in the most uniform and stable way. The infrared domain represents 
an outstanding challenge with this respect, since the number of modes decreases 
significantly. 

e Calibration sources: Despite the common feeling that laser-frequency combs have 
already solved all the problems, reality demonstrates that this isn’t yet the case. 
Commercially available sources still show at least one of the following limitations: 
Poor reliability in continuous operation; limited wavelength range; limited life- 
time; high price. It seems that at least for a several years one will have to live 
with more conventional calibration sources and exploit the combination of several 
sources to obtain the desired results. It appears, nevertheless, as a fact that only 
laser-frequency combs (or equivalent devices locked on atomic clocks) will be able 
to deliver long-term cm s~! accuracy. 
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e Wavelength solution and extraction algorithms: The availability of a suitable 
calibration source will, however, not solve all the problems alone. In fact, the 
frequency or wavelength information has to be translated into the wavelength 
space of the measured astronomical object, a task called wavelength solution, 
and it is not as simple as it may appear at first glance. Let us recall that the IP 
is defined as the profile produced by an unresolved emission line. Most spectral 
calibration sources indeed produce unresolved emission lines. On the other hand, 
the spectral lines of astronomical targets are mostly in absorption and are at 
least partially resolved by the high-resolution spectrograph. The continuum flux 
is furthermore dominant and variable, given the nature of the blazed echellogram. 
Considering all this, we conclude that the convolution of a stellar spectrum with 
the IP may produce a spectral line shift in wavelength that is different from that 
of the narrow emission lines of the calibration source. A systematic wavelength 
offset (reduced accuracy) will then be measured. If furthermore we consider the 
fact that the IP may vary across the spectrum and may evolve with time, then we 
will conclude that not only may the local accuracy be compromised, but also the 
repeatability of the radial-velocity measurement. Such effects have been indeed 
observed on the HARPS spectrograph after the change of fiber in 2015 with the 
consequent re-alignment of the spectrograph and the resulting IP change.®° The 
radial-velocity offset observed on RV standards was not only significantly different 
from zero, but was strongly dependent on the projected stellar rotational speed 
(i.e. the average line width). 

e Detectors: In the whole optical system of a PRVS, the detector is probably the 
component on which the instrument builders have the least influence and design 
margins. On the one hand, this is certainly due to the technological challenges 
and development costs, on the other hand, one can hardly deny a certain level of 
monopoly by the existing manufacturers. Most of the negotiations will end with 
a “take or leave it” statement and the project team will have the only option of 
selecting the less inconvenient device. Nevertheless, and in general terms, astro- 
nomical CCD and CMOS detectors have reached exceptional quality with regard 
to many aspects. The radial velocity business has some further requirements that 
are not always covered by the standard description and selection procedures. It 
will therefore be up to the Instrument Scientist and the Project Team to make a 
trade-off and select the most convenient device. As discussed in a previous chap- 
ter, this trade-off should consider not only QE, but all the parameters relevant 
for precision RV measurements. 
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Spectropolarimetry — the measurement of polarization as a function of wave- 
length — provides great diagnostic value in astronomy. For instance, the polar- 
ization of spectral lines reveals magnetic fields, and the properties of particles 
that scatter light are revealed by the continuum polarization. We describe polar- 
ization observables and how the quality of the observations and the performance 
of a spectropolarimeter can be quantified. We continue with a summary of compo- 
nents for polarimetry and an extensive discussion of how polarization modulation 
schemes can be implemented to optimize the observing capabilities. In addition, 
we summarize the instrumental challenges faced by sensitive and accurate spec- 
tropolarimetry. We conclude with an overview of approaches to measuring spectral 
line polarization and continuum polarization as well as a few examples of modern 
spectropolarimeters. 


1. Introduction 


The polarization of light that reaches our telescopes, optical instruments and 
detectors often carries unique and unambiguous information about the physi- 
cal environment that emitted or interacted with the light. When recorded as a 
function of wavelength in the form of spectropolarimetry, polarization can pro- 
vide particularly crucial information about any astrophysical object that is not 
perfectly isotropic, e.g. due to scattering or the presence of magnetic fields. 
Science cases for spectropolarimetry therefore range from characterizing solar- 
system objects to detecting life on exoplanets, from measuring solar and stel- 
lar magnetic fields to constraining the explosion geometries of supernovae and 
gamma-ray bursts (GRBs), from observing the central engines of active galac- 
tic nuclei (AGNs) to determining the source of Lya radiation in high-redshift 


239 


240 C. U. Keller & F. Snik 


galaxies, and from observing the X-ray polarization of pulsars to constraining 
cosmological models through the cosmic microwave background (CMB) radia- 
tion. This chapter is focused on spectropolarimetric approaches and instrumen- 
tation in the optical range where classical optics and photon-counting detectors 
are used, i.e. from the near-UV to the infrared part of the spectrum (300 nm 
to 20 wm). For the wide range of science cases for spectropolarimetric astro- 
nomical instrumentation and for in-depth discussions of their performance, we 
refer the reader to the overviews provided in monographs’? and extensive review 
papers.? 6 


1.1. Polarimetric Observables 


The design of a spectropolarimetric instrument depends greatly on the requirements 
for the observables: 


e spectral range: UV and/or visible and/or IR; 

e spectral bandwidth; 

e spectral resolution: slowly spectrally varying continuum polarization or spectral 
line polarization; 

e linear polarization, circular polarization, or both; 

e imaging capabilities: none, (scanning) long slit, or integral-field unit; 

e temporal sampling. 


The spectral resolution is largely determined by the physics that the observations 
should probe. Medium-resolution spectropolarimetry is often targeted towards mea- 
suring the microphysical properties (size, shape, refractive index) of dust particles 
that polarize starlight by scattering, or by differential absorption of nonspherical 
grains that are aligned by a magnetic field. High-resolution spectropolarimetry is 
mostly geared towards measuring polarization signals in spectral lines, e.g. through 
the Zeeman effect or other atomic physics mechanisms, although the broad spectral 
lines of massive stars and white dwarfs can also be accessed by medium spectral 
resolution. For such spectral-line measurements, the continuum polarization is often 
irrelevant and may not even be accurately measurable. 

Most astronomical polarization is linear, as that is readily produced by any 
scattering /reflection of starlight. Broadband circular polarization is only created by 
multiple scattering or an otherwise secondary breaking of symmetry, and is therefore 
usually much weaker than linear polarization. However, circular polarization is a 
signature of life itself: complex biological molecules such as amino acids and sugars, 
but also DNA and chlorophyll occur in one handedness, whereas chemistry would 
not favor one handedness over the other. The circular polarization signals produced 
by the symmetry breaking due to this homochirality are also weak,’ and typically 
orders of magnitude smaller than any linear polarization. On the other hand, the 
spectral line polarization due to the Zeeman effect is predominantly circular, with 
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an antisymmetric circular polarization pattern that is proportional to the line-of- 
sight magnetic field; the linear polarization only scales in second-order with the 
transversal magnetic field.4 


1.2. Polarimetric Efficiency, Sensitivity €& Accuracy 


In the optical regime, polarimetry is essentially differential photometry. The source 
polarization is described by the Stokes parameters [I,Q,U,V]?, which are read- 
ily formulated as the differences in photon flux after filtering the beam for spe- 
cific polarization directions (Sec. 2.1). Polarization measurements therefore always 
involve two or more intensity measurements that are subtracted (Sec. 2.2). If 
the incident light is unpolarized and the instrument does not add polarization 
of its own, these intensity measurements yield the same result in the absence of 
noise, and their difference is indeed zero. Much of the effort in designing a (spec- 
tro)polarimetric instrument deals with mitigating the effects that would yield a 
nonzero result even if the incident light is unpolarized. Astronomical sources are 
typically polarized at levels of ~1% or (much) below, so it is challenging to detect 
these small signals. Before diving into the details of polarization and polarime- 
try in the following section, we define the polarization performance parameters 
that should be part of a complete set of requirements for designing and building a 
spectropolarimeter. 

The polarimetric efficiency® describes how efficient a measurement of one or 
several of the polarized Stokes parameters is, taking into account the polarization 
effects of all the optics including the detector and a normalization® to compare 
different measurement approaches and instruments. A spectropolarimeter with opti- 
mal polarimetric efficiency at a particular wavelength for a particular set of Stokes 
parameters minimizes the error propagation from photon and other random noise 
sources to the derived Stokes parameters. 

The polarimetric sensitivity describes the smallest fractional polarization signal 
that can be detected above the random noise and noise-like systematic artifacts. 
When the polarimetric performance is limited only by photon noise, the polarimetric 
sensitivity is proportional to 1/\/Ne, where N, is the number of photo-electrons 
collected within a specific wavelength bin during the exposure time. 

The polarimetric accuracy describes how well a polarization signal measured 
with infinite sensitivity corresponds to the true source polarization. Since the true 
source polarization can almost never be known a priori, the polarimetric accuracy 
is usually translated into a calibration accuracy. The accuracy requirements are 
generally described as a 4 x 4 matrix that relates the measured polarization to the 
incoming polarization.* Often, these 16 numbers can be summarized into a require- 
ment on the calibration of the instrumental polarization or induced polarization 
that creates uncertainty in the zero point of the polarization, and the polariza- 
tion transmission or scaling factor that describes the fraction of polarized light 
that is actually detected. In addition, optical elements can also induce polarization 
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cross-talk between linear and circular polarization or in the form of polarization 
rotation. There are often specific requirements on calibrating these effects, par- 
ticularly when the observable circular polarization is much smaller than the lin- 
ear (or vice versa), or the polarization angle on the sky needs to be accurately 
determined. 


2. Principles of Polarization Measurements 


2.1. Stokes Parameters and Mueller Matrices 


The Stokes vector is the most useful way to describe the polarization of light in 
astronomical spectropolarimetry because it can also describe partially polarized 
and unpolarized light and is linearly related to measurable intensities. It is defined 
as 


d intensity 

j- Q = linear 0° — linear 90° (1) 
U linear 45° — linear 135° ] ’ 
V circular left — right 


where I, Q, U and V are the Stokes parameters, which are linear combinations of 
intensities measured through linear and circular polarizers. A Stokes vector is not 
an arbitrary four-dimensional vector, since I? > Q? + U? + V?. Unpolarized light 
is described by Q = U = V = 0. The degree of polarization is defined as P = 
Q? + U2 + V?/I. Stokes vectors are often normalized so that J = 1; fractional 
polarization Q/I,U/I and V/T is often less sensitive to instrumental artifacts. Stokes 
vectors of incoherent beams traveling in about the same direction can be added. 
Mueller matrices describe the (linear) transformation between Stokes vectors 
associated with optical elements and surfaces, i.e. I’ = MT. Mueller matrices have 
the following form: 


Mi Mig Mig Mia 

Mz, M22 M23 Moa (2) 
Mz, M32 M33 Mga | ~ 

Mar Maz Maz Maa 


The 16 elements of a Mueller matrix are not all independent of each other because 
the resulting vector needs to be a Stokes vector. A normalized Mueller matrix is 
obtained by scaling the matrix such that the upper left element is equal to one. 
Extensive lists of Mueller matrices can be found in textbooks.”:? 

When a beam of light passes through N optical elements, each described by 
a Mueller matrix M;, the combined Mueller matrix M’ of the whole assembly is 
given by M’ = MyMy_1---M2Mj. The reversed order of the Mueller matrices is 
important since Mueller matrices do not commute in general. 


Spectropolarimetry 243 


Rotations of elements described by Mueller matrices are given by M’ = 
R(—a)MR(aq), where a is the rotation angle, and the rotation matrix R is given by 


1 0 0 0 
0 cos2a  sin2a 0 

ee 0 -—sin2a cos2a 0] ~ i?) 
0 0 0 1 


The Mueller matrix of a complete linear polarizer at an angle @ with respect to 
the +Q-direction is given by 


1 cos 20 sin 20 0 
1 | cos20 cos? 20 sin20cos26 0 

Mpol()=3 | sin29 sin20cos29 sin?29 «0. | * (4) 
0 0 0 0 


The Mueller matrix of a linear retarder with its fast axis at an angle 0 with 
respect to the +Q-direction is given by 


1 0 0 0 

0 cos? 20 + cos 6 sin? 20 cos 20 sin 20 — cos26cosdsin2@ sin 2@sin 6 

0 cos 20 sin 26 — cos 26 cos 6 sin 20 cos 6 cos? 26 + sin? 26 —cos26sin6 |’ 
0 —sin20sin 6 cos 26 sin 6 cos 0 


(5) 
where 0 is the retardance, i.e. the phase shift between the two orthogonal linear 
polarization components expressed in radians. Most retarders are based on bire- 
fringent materials that have different indices of refraction for different angles of 
the incoming linear polarization. The retardance, which strongly depends on wave- 
length, is then given by 

27d 
5 = — (ne(A) — nofA)) (6) 
where d is the geometrical thickness, \ is the wavelength, and n- and np are the 
indices of refraction for the extraordinary and the ordinary rays, respectively. 


2.2. Polarization Modulation 


Since detectors in the optical range do not directly measure electric fields or cor- 
relations between dipole antennas at different orientations and locations, optical 
spectropolarimetry always relies on polarization modulation to convert the observ- 
able polarization signal into a variable intensity signal that can be detected. Hence, 
the polarization information needs to fit on the detector, in addition to the spectral 
information and the potential spatial and temporal information that is recorded. 
Therefore, the design of a spectropolarimetric instrument always requires a trade-off 
in terms of modulation domains and the associated systematic effects. The available 
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modulation domains are in the temporal, spatial and spectral dimensions, or linear 
combinations thereof.!9 

The most straightforward implementation for (spectro)polarimetry with tem- 
poral modulation consists of a rotating wave plate with retardance 6 in front of a 
polarizer. The resulting intensity as a function of rotation angle a, and therefore 


time, is as follows: 


1 
)=5(1+ EA (1 + cos d) + (1 — cos 6) cos 4a) 


+51 —cos 0) sin4da — V sind sin 2a] ) : (7) 


A half-wave retarder only modulates the linear polarization at four times the 
rotation rate of the wave plate. A quarter-wave retarder optimally modulates circu- 
lar polarization at two times the rate, but the linear polarization modulation is still 
present at half the amplitude. Generally, rotation mechanisms and the detector read- 
out are slow with respect to time-variable systematic effects like atmospheric seeing, 
changing sky transparency and instrument flexures. Such effects will invariably lead 
to spurious polarization effects. 

Pure temporal modulation therefore only enables high-sensitivity polarimetric 
performance if the modulation and the corresponding intensity recording are suffi- 
ciently fast. Such rapid temporal modulation is offered by liquid crystal elements. 
The simplest and fastest implementation is with a Ferro-electric Liquid Crystal 
(FLC), which has a fixed retardance (typically half-wave at a center wavelength) and 
a fast axis that switches orientation to 0/45° upon application of a positive/negative 
voltage. The fast modulator is positioned in front of a linear polarizer that acts 
as the polarization analyzer, as it transmits only one polarization state (indeed, 
polarimetry always involves actually polarizing the light inside the instrument). 
Figure 1/a) illustrates how an FLC modulates Stokes +Q on successive spectral 
intensity recordings. FLCs can modulate with rates of up to ~1 kHz. Therefore, 
fast detectors or dedicated demodulating detectors like ZIMPOL" are required to 
keep up. 

More versatile temporal modulation schemes are offered by Liquid Crystal Vari- 
able Retarders (LCVRs), which have a fixed fast axis and a voltage-controlled retar- 
dance. Figure 1{b) gives an example of a six-step full-Stokes modulation scheme 
(cf. Eq. (1)). As there are only four Stokes parameters, one can also program a 
four-step temporal modulation scheme with two LCVRs that can encode the Stokes 
vector with the same optimal polarimetric efficiency as the six-step Stokes defini- 
tion scheme. Note, however, that FLCs and LCVRs are, by definition, chromatic, 
and that optimal modulation efficiency is only achieved at a single wavelength. 
By adding additional (fixed or variable) retarders, one can achieve close-to-optimal 


temporal modulation efficiency over a wide range of wavelengths, known as poly- 
12 


chromatic modulation. 
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(a) Ferroelectric Liquid Crystal peekes Q, single-beam) 


t 
Yar WY 


(b) Liquid Crystal Variable Retarders (full-Stokes, single-beam) 


[0,0,%4,%,0,0]A [0,%2,1%4,%4,,%]r 
(c) Beam-exchange (circular or linear polarization) 


“~k | Yar 
(d) Polychromatic modulation (full-Stokes, single-beam) 


Fig. 1. Polarization modulation approaches. 
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Using a regular polarizer as the analyzer implies that half of the light is lost 
(through absorption or reflection by that polarizer), which is a cardinal sin in 
astronomy. This polarizer can be replaced with a polarizing beam-splitter, e.g. a 
Wollaston prism in a pupil plane, a Savart plate in front of a focal plane, or a cube 
beam-splitter that produces two independent beams. To simultaneously record both 
beams, at least twice the number of detector pixels (or even two separate detectors) 
are required. This rudimentary version of spatial modulation ensures that recordings 


of two perpendicular polarization states (e.g. +Q) occur simultaneously, and that 
temporal systematic errors are thus eliminated. However, the systematic differences 
between the two beams (e.g. differential aberrations, imperfect pixel gain correction) 
now lead to spurious polarization effects. Moreover, the observable Stokes param- 
eters need to be converted into the linear polarization state that is split/analyzed 
by the beam-splitter. 

A combination of spatial and temporal modulation can mitigate their indi- 
vidual disadvantages. The combination of a rotating wave plate with a polarizing 
beam-splitter gives rise to the often used dual-beam or beam-exchange implemen- 
tation.': 8:14 For such a system, the beams are essentially swapped, and the four 
intensity recordings in both beams for two orientations of the wave plate can be 
demodulated to one of the polarized Stokes parameters Q, U or V, and the intensity 
I (see Fig. 1‘c)). As the four intensity recordings only need to constrain two Stokes 
parameters, the additional degrees of freedom provide sufficient redundancy to dis- 
tinguish real polarization signals from spurious signals caused by both the spatial 
and the temporal systematic effects. Through the double-difference or the double- 
ratio techniques, the measured Q/TI (or U/I or V/T) is free from systematic errors 
for small degrees of polarization.' + !%:!4 Therefore, the temporal modulation, when 
combined with a dual-beam approach, can be slow. All Stokes parameters can be 
measured according to Eq. (7) by rotating the retarder in increments of 22.5° for lin- 
ear polarization and +45° for circular polarization. Note that for the second beam, 
the first plus sign in Eq. (7) turns into a minus sign. By rotating the retarder over 


a full rotation in equal steps, one can also directly assess the presence of systematic 
effects through the so-called null spectra!* !4 i 

To ensure high polarimetric efficiency over a large wavelength range, the 
rotating retarder is often achromatic. Achromatic retardance can be achieved by 
combining two different crystals to cancel the individual birefringence dispersions, 
by stacking identical wave plates at different angles (Pancharatnam configura- 
tion), or by using Fresnel rhombs. Another solution is found by not optimizing 


o] 


or through Fourier analysis. 


for achromaticity, but by maximizing the full-Stokes modulation efficiency over a 
large spectral bandwidth. Such polychromatic modulators are highly chromatic (see 
Fig. 1‘d)), and therefore only work for spectropolarimetry. They can span wave- 
length ranges as large as 300-2500nm,'° and also fully benefit from a dual-beam 
implementation.4 

A completely different class of spectropolarimeters is based on spectral mod- 
ulation, where the polarization information is encoded in the spectral domain 
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together with the actual spectral intensity and polarization features. The first 
implementations were full-Stokes,!” 18 and were based on two static, thick, birefrin- 
gent crystals that produce three different spectral carriers for Q(A), U(A) and V(A). 
To accommodate these carriers, the spectral resolution of the instrument typically 
needs to be increased by a factor of ten, and assumptions need to be made on the 
spectral content of the incident light to prevent aliasing with the polarimetric carri- 
ers. Stokes U and V share the same carrier frequencies, and merely have a different 
phase. However, circular polarization is typically orders of magnitude weaker than 
linear, or it may even be neglected completely, rendering this modulation method 
quite inefficient. Figure 1{e) outlines an optimal implementation with an achro- 
matic quarter-wave retarder and a single thick retarder that modulates only linear 
polarization.'? The spectral modulation is quasi-sinusoidal, for which the relative 
amplitude scales with the degree of linear polarization, and the phase is proportional 
to the angle of linear polarization. Moreover, this implementation is combined with 
a dual-beam approach, such that the sum of the two simultaneous spectra is the 
original intensity spectrum at the full spectral resolution, and the normalized differ- 
ence is the polarization modulation envelope, which can be demodulated through 
side-band Fourier analysis or a joint fit of the polarization signal and instrument 
model parameters. As this method provides all information in a single snapshot, it 
is very robust against systematics, and therefore very accurate. 

A variation of spectral modulation is found in true spatial modulation, which 
induces polarization modulation carriers along the slit, i.e. perpendicular to 
the spectral direction. This accommodates polarimetry at the full resolution of 
the spectrograph, while the additional pixels along the spatial direction furnish the 
collection of many photons in a single shot. A first full-Stokes or linear polarime- 
ter with true spatial modulation was implemented by using wedged crystals?° (see 
Fig. 1(f)). However, this implementation also has Stokes U and V at the same 
carrier frequencies, which severely limits the detection of very weak circular polar- 
ization (e.g. due to the homochirality of biological molecules) in the presence of 
strong linear polarization (due to regular reflection/scattering of starlight). A bet- 
ter approach?!?? implements a patterned, solid-state liquid-crystal retarder where 
the fast axis rotates along the slit direction (see Fig. 1(g)). The resulting carriers 
are given by Eq. (7), which clearly separates the frequencies for linear and circular 
polarization. The challenge for such implementations is how to guarantee homo- 
geneous, or at least slowly spatially-varying, slit illumination. Also, in this case, a 
dual-beam implementation minimizes such systematics. 

In the cartoon of Fig. 1, the modulator and analyzer are positioned together, but 
in practice they can be located in different parts of the instrument. It is preferable 
to install the modulator as far upstream in the optical train as possible to minimize 
the amount of instrumental polarization effects of the optics in front of it (see 
Sec. 3.1). The optics between the modulator and the analyzer then need to ensure 
that the polarization states that the analyzer filters are transmitted by the system 
with high efficiency.?? These polarization states are preferably eigenvectors, e.g. the 
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linear polarization states that correspond to the S/P directions of reflections and 
refractions, which do not obtain differential phase retardance due to the Fresnel 
equations, and are therefore not turned into circular polarization that the analyzer 
is blind to. 


3. Instrumental Challenges 


Astronomical spectropolarimetry is often limited by systematic instrumental errors 
rather than by statistical errors such as photon and read-out noise. As such, a 
clear understanding of the instrumental challenges is crucial. The most important 
instrumental errors commonly encountered in high-precision spectropolarimetry are 
as follows: 


e atmospheric seeing and guiding errors, which can be minimized by design as 
follows; 

e instrumental polarization, which can be reduced by design and removed with 
calibration; 

e polarized fringes, which can be reduced by design and during data reduction. 


3.1. Instrumental Polarization 


In many cases the influence of telescopes and instruments on the state of polarization 
of the source can be described by Mueller matrices, which have the general form 


Io>I Q-a>!I UI Vo! 

1>Q Q+Q U+=@ VQ és 
I->U QU U=U VU] 

IT3>V Q>V UV VoaV 


The different terms are usually grouped into three categories with X = Q,U,V: 


e J — X: instrumentally induced polarization, 
e X; — Xoz; and X — I: instrumentally introduced cross-talk, and 
e X — X: instrumentally introduced depolarization. 


Telescope polarization can be minimized by placing an instrument in the 
Cassegrain focus, where the rotational symmetry ensures a vanishing net polar- 
ization on the optical axis. However, off-axis positions will still exhibit polarization, 
and mirror coating variations and seeing destroy the rotational symmetry even in 
the center of the field of view. Hence, no telescope is completely free of instrumental 
polarization. 

Oblique reflections off and transmission through optical surfaces such as mirrors 
introduce polarization and cross-talk between the Stokes parameters. Transmission 
through dielectric materials such as glass introduces retardance, but no polarization. 
Oblique reflections are often due to Nasmyth and Coudé mirrors or the complicated 
mirror arrangements in solar telescopes. The accurate modeling of these reflections 
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requires thin-film codes, since even aluminum coatings consist of bulk aluminum 
covered by a thin layer of aluminum oxide.?+ The same applies to oblique transmis- 
sion through glass surfaces with multi-layer coatings. The difficulty of modeling the 
instrumental effects of telescopes implies that a direct measurement of the instru- 
mental polarization and cross-talk is required in most cases. 

Spectrographs often contain obliquely reflecting elements, which will introduce 
instrumental polarization and cross-talk. In addition, the entrance slit and the grat- 
ing of a spectrograph will also act as partial polarizers. 

When the width of the spectrograph slit is less than about 20 times the wave- 
length, the slit acts as a partial polarizer. Various models have been developed to 
calculate the slit polarization, but none of them agree well enough with experiments 
to make them useful.?° Experience has shown that making the slit as thin as possi- 
ble reduces its polarization.?° Making a slit from dielectric materials does not help 
because absorbing dielectrics also have a complex index of refraction. 

Gratings can have a large influence on polarization. In general, low-order grat- 
ings exhibit large variations of the instrumental polarization with wavelength. At 
some wavelengths, they often act as complete polarizers, which is known as Woods 
anomaly.?” Maximum transmission is normally obtained with the polarization par- 
allel to the grating lines. High-order échelle gratings typically have only a minor 
influence on the polarization. Grating manufacturers can provide measurements of 
the grating efficiency for horizontal and vertical linear polarization. 


3.2. Polarization Calibration 


Polarization calibration is typically divided into two parts: telescope calibration and 
spectropolarimeter calibration. The latter can be implemented relatively easily by 
inserting calibration optics just before the instrument. A linear polarizer followed 
by a quarter-wave plate, both of which can be rotated, can create linear polarization 
under any angle as well as circular polarization.?® Four independent rotation com- 
binations can provide sufficient information to determine the instrumental Mueller 
matrix. 

Calibrating the telescope is more difficult as one cannot put calibration optics in 
front of the telescope. Hence, one has to rely on sources of known polarization, either 
in degree of polarization, angle of linear polarization, or both. Standard stars?® with 
known polarization and stars that are known to be unpolarized can be used to fit 
the free parameters of a model. 


3.3. Higher-Order Instrumental Effects 


Multiple reflections between the surfaces of birefringent materials such as fixed 
and liquid-crystal retarders lead to polarized spectral fringes. This occurs because 
the optical path length in the birefringent element depends on the polarization.” 
Polarized fringes have been observed in all types of retarders. To calculate these 
fringes, the Berreman calculus can be used.®° Tilting, wedging, and coating the 
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retarders reduces the fringe amplitude considerably. However, at the 1 x 10~° level, 
polarized fringes are almost always present. Often the fringes can be removed during 
the data reduction by an appropriate filtering in the Fourier domain or with an 
auto-regressive model.?! 

High polarimetric sensitivity at or below the 1 x 10~4 level can also be hampered 
by dark-current or bias subtraction errors and detector nonlinearities coupled with 
instrumental polarization.° 


4. Spectropolarimeters 


In this section, we provide a broad overview of designs of modern astronomical spec- 
tropolarimeters. For a recent and complete overview of instruments, their parame- 
ters and performances, see Ref. 3. 


4.1. Absolute Continuum Spectropolarimetry 


Several “workhorse” instruments have a successful polarimetric mode, most promi- 
nently FORS* (1 and 2) at the VLT. Polarimetry can be implemented both for the 
imaging as well as the spectroscopic mode, by inserting a Wollaston prism in the 
pupil plane of the reimager and a (mosaicked) quartz-MgF2 Pancharatnam “super- 
achromatic” rotating quarter/half-wave retarder in front of it. Although FORS is 
mounted at the Cassegrain focus, there are significant instrumental polarization 
effects. In imaging mode, there is a clear variation of induced polarization across 
the field,!° which should be zero in the center of the field for slit spectropolarimetry. 
Nevertheless, the polarization zero point of FORS spectropolarimetry is known to 
vary by ~0.1%.°8 Also, significant amounts of linear—circular cross-talk have been 
observed in FORS. Moreover, instrument flexures during tracking of the target are 
know to create spurious line polarization signals.°+ 

While FORS spectropolarimetry with a spectral resolving power of 260-2600 
is mostly optimized for continuum spectropolarimetry of e.g. solar-system objects 
and interstellar dust, it can be used for spectral-line polarimetry of massive stars 
and emission-line nebulae. 


4.2. Relative Spectral-Line Polarimetry and Multi-Line Techniques 


The most targeted spectropolarimetric instruments in astronomy are high-resolution 
spectropolarimeters that are optimized to measure stellar magnetic fields, such 
as the CFHT instruments ESPaDOnS*° and SPIRou,”® the polarimetric mode of 
HARPS,?” and LBT/PEPSI.** These instruments consist of a fiber-fed, stabilized, 
high-resolution spectrograph (R, ~ 100,000), with a polarimetric module at the 
(mostly instrumental-polarization-free) Cassegrain focus before the fiber injection. 
All these instruments consist of a beam-exchange system, with a polarizing beam- 
splitter feeding two separate fibers, producing two interlaced échelle spectra. The 
modulators are rotating quarter-wave and half-wave Fresnel rhombs or polymer 
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Pancharatnam plates, which both induce minimal spectral fringing. While the 
polarimetric approach allows for high-sensitivity observations, the photon noise level 
is usually too high to detect the polarization signatures due to the Zeeman effect in 
individual spectral lines. Multiline techniques like Least-Squares Deconvolution?” *° 
are therefore used to cleverly add up hundreds of spectral lines to detect and map 
magnetic fields on stars. Because of the variable transmission of the fibers during 
tracking of a target, the absolute continuum polarization is often relinquished, as 
such systematics are not easily recovered by the beam exchange. 


4.3. Solar Spectropolarimeters 


Spectropolarimetry has a long history in solar physics as it uniquely provides mea- 
surements of the magnetic field vector in the solar atmosphere through the Zeeman 
effect. Iglesias and Feller® provide an excellent overview of modern solar spectropo- 
larimeters. 

The SOLIS Vector-Spectromagnetograph (VSM)?° provides daily vector- 
magnetic field measurements of the sun. It is based on a temporal modulator using 
FLCs and fixed retarders just behind the entrance slit, spectrograph optics where 
crossed mirrors compensate the individual instrumental polarization and cross-talks, 
and modified Savart-plate polarizing beam-splitters just in front of the high-speed 
cameras. While the VSM only covers a small spectral range at any given time, its slit 
has a length of 0.5 degrees on the sky and can achieve a polarization sensitivity of a 
few times 10~¢ in the less than one second. The high speed enables the instrument 
to scan the whole sun in about 20min. Its polarization calibration is so accurate 
that all spatial pixels can be added to obtain a polarized spectrum of the sun with 
unprecedented sensitivity and accuracy.*! 

A similar polarimetry approach is used in the Spectro-Polarimeter on the 
Japanese Hinode solar satellite.42 The temporal modulator is a continously rotat- 
ing bi-crystaline retarder that is located in the rotationally symmetric part of the 
telescope. Due to the rotational symmetry, there is no telescope polarization. The 
polarizing beamsplitter is located just before the CCD camera. This instrument is 
likely to be the best spectro-polarimeter in space. 

While most solar spectropolarimeters are based on scanning long-slit spectro- 
graphs, progress in producing polarization-maintaining fibers offers the opportunity 
to place a polarizing beam-splitter after a fiber bundle,** thus simplifying integral 
field units for solar polarimetry. While the fibers do not maintain all polarization 
states, they do preserve two orthogonal linear states, which is sufficient as long as 
these two states match the corresponding states of the polarizing beam-splitter. 


4.4. Innovative Spectropolarimeters 


In recent years, new technology has enabled radically new instrument designs for 
spectropolarimetry. In particular, the polarization grating* is offering new and 
extremely high-efficiency solutions, as it both acts as an almost perfect grating as 
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well as a polarizing beam-splitter. The polarization grating is essentially a pat- 
terned half-wave liquid-crystal retarder that can be deposited on any transmissive 
or reflective optic. Such an optic is part of a class of geometric phase holograms*® 
that operate in an achromatic fashion on circular polarization states. When the 
retarder is perfectly half-wave, the polarization grating diffracts all light with one 
circular polarization state into order +1, and the other handedness goes to order —1. 
The zero order and all higher orders are absent. In practice, there is a small zero- 
order leakage term, but multiple self-aligning layer of liquid crystals can conspire 
to yield close-to-half wave performance over more than an octave in wavelength. 
Liquid crystal recipes exist from near-UV wavelengths all the way up to 40 pm.*° 
The recent polarimetric upgrade of the WIRC instrument*’ at the 200-inch 
telescope at Palomar makes clever use of polarization gratings. It has a split-pupil 
design in which half the light is diffracted in the +45° direction, and the other 
half in the —45° direction, producing four spectral recordings for each point source. 
By adding quarter-wave plates in the commensurate directions, the pairs of spec- 


tra are split for Stokes +Q and +U, comprising a highly efficient snapshot linear 


spectropolarimeter. 

The liquid-crystal recipe for polarization gratings can also be formulated such 
that the grating is only diffracting light within a certain wavelength range. The 
recent CRIRES+ upgrade at the VLT includes a polarization grating that splits and 
disperses the near-infrared light but transmits the visible light without dispersion 
and splitting, which is subsequently used for wavefront sensing.*® 

Figure 2 shows a result from a spectropolarimetric integral-field unit** that also 
makes use of a polarization grating. In this instrument configuration, the field is 


Fig. 2. Integral-field spectropolarimetric observation of Venus. Adapted from Ref. 44. 
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first cut up by a microlens array that produces a grid of small micropupils. In the 
consecutive pupil plane, a quarter-wave plate and a polarization grating effectuate 
the linear polarization splitting and spectral dispersion. A polychromatic modulator 
in front of the microlens array ensures high-efficiency four-state full-linear temporal 
polarization modulation. A combination with a spectral modulator would accom- 
plish five-dimensional [«, y, , (1, Q, U,V), t]-sensing in a single snapshot exposure. 
One of the other main advantages of such an integral-field implementation is that 
the field of view is first sampled by the microlens array before the polarization 
beam-splitting, and that there are therefore no differential aberrations between the 
two beams that could otherwise cause spurious polarization signals. 


5. Conclusions 


The polarization of light as a function of wavelength from astronomical objects con- 
tains great diagnostic power. Much progress has been made over the last decades 
in building instruments to measure astronomical polarization, in particular due to 
technical developments including fast imaging detectors, electro-optical devices as 
well as novel optical components based on micro-patterned surfaces. While classical 
polarimetry approaches were focused on achromatic components to measure polar- 
ized light over a large wavelength range, modern approaches optimize the detec- 
tion efficiency over a broad wavelength range without requiring highly achromatic 
components. Similar advances have been made in transforming polarization infor- 
mation into intensity variations; classical approaches focused on polarized beam- 
splitting and rotating retarders while the latest approaches combine spatial and 
temporal modulation and add spectral modulation. Future progress will come from 
polarization-sensitive detectors that are now being offered by several detector man- 
ufacturers, engineered materials based on subwavelength structures in dielectric and 
metallic materials and from the clever combination of existing technologies to build 
polarimeters for highly specific applications. 
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Integral field spectroscopy involves obtaining spectral information across a con- 
tiguous two-dimensional field of view. It is therefore a very powerful tool for the 
study of intrinsically extended objects (e.g. galaxies, nebulae, etc), systems where 
object locations are not precisely known, or where three-dimensional information 
enables improved sensitivity (e.g. direct imaging of exoplanets). Here we cover 
the basics of integral field techniques, with a focus on the three primary integral- 
field-unit types in use today: lenslet arrays, imager slicers, and fiber bundles. 


1. Introduction 


Integral field spectroscopy involves obtaining spectral information across a spatially 
resolved contiguous two-dimensional field of view. In more general terminology, it 
can be considered as a specialized subset of multispectral imaging. In astronomy, 
integral field spectroscopy is particularly relevant for the study of extended objects 
with spectral features. Galaxies and nebulae are obvious cases, where an integral 
field spectrograph (IFS) allows us to probe spatial variations of physical parameters 
such as temperature, composition, or velocity. 

The areal coverage of integral field spectroscopy also has an advantage over 
classical slit spectroscopy when the precise positions of sources are not known. It 
allows, to some extent, “point-and-shoot” spectroscopy without the overhead of 
accurately acquiring the target(s) onto a slit. This can have advantages in observ- 
ing efficiency, and is attractive for rapid-response follow-up observations of transient 
events (e.g. gamma ray bursts) where the exact location of the source may not be 
known. Because integral field spectrographs properly and fully sample the tele- 
scope spatial point spread function (PSF) they are also immune to the slit effects 
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common in classical spectrographs. Virtually all integral field units in astronomy 
are larger than the typical seeing disk, meaning that all of an object’s flux makes it 
into the spectrograph. Atmospheric dispersion, pointing errors, etc., do not affect 
the throughput of the system, allowing the flux at all wavelengths to be properly 
recovered. 

In recent years, low (spectral) resolution IFS have also become important in 
the search for and study of extra-solar planets. Even stars, when looking for close 
companions, count as extended objects due to the diffraction pattern of the telescope 
and atmospheric turbulence. Here, an IFS can provide a simultaneous map of the 
complex “speckle pattern” (which scales with wavelength) around the primary star, 
allowing it to be modeled and removed in post-processing, greatly increasing the 
star—planet contrast that can be achieved.!:? The spectrum of the planet, where-ever 
in its orbit it may be, comes for free of course. 

The nature of integral field spectroscopy means that the raw observations, com- 
prising two spatial and one spectral dimension, are often “encoded” onto the detec- 
tor in a nontrivial way (see e.g. Figs. 3 and 6). Software is needed to perform not only 
the standard data reduction processes, but to “reconstruct” the raw two-dimensional 
detector data into a three-dimensional format for analysis. Integral field data are 
commonly stored as a “datacube” (see Sec. 6.3), with the X and Y dimensions 
representing spatial coordinates (typically right ascension and declination), and the 
Z dimension representing wavelength. Figure 1 shows a schematic representation 
of the datacube from an IFU observation of a galaxy. Taking all the points at 
one wavelength, we get a two-dimensional narrowband image (limited only by the 
spectral resolution of the spectrograph). Alternatively, taking all wavelengths at a 
single X or Y coordinate, we get the equivalent of a long slit spectrum. In reality, 
the simultaneous 3D nature of a data cube allows much more powerful analysis 
techniques to be applied. 

The popularity and impact of integral field spectroscopy has grown over the past 
few decades. From being a fringe technique introduced fairly late in the development 
of 4—8 m class instrumentation, integral field spectroscopy is central to the first light 
instrumentation of all three “extremely large telescopes” (GMTIFS® on the 25 m 
GMT, IRIS* on the 30 m TMT, and HARMONI? on the 39 m ELT). 

For more overviews of integral field spectroscopy, see Refs. 6-8. 


2. Integral Field Techniques 


Many techniques can be used to generate an integral field dataset. These range 
from adaptations of “normal” imagers and spectrographs up to custom-designed 
instruments focusing solely on a single integral field technique. One strong advantage 
of true integral field techniques is that all the data are gathered simultaneously, 
providing a much more homogeneous dataset. In the sections below, we summarize 
a few techniques that can be used to generate a three-dimensional dataset. 
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Fig. 1. Schematic of an integral field data cube. A two-dimensional slice can be viewed as either 
a narrowband image or a long slit spectrum. Image credit: Stephen Todd (ROE) /Douglas Pierce- 
Price (JAC, STFC). (See electronic edition for a color version of this figure.) 


2.1. Reformatting Optics 


The most common integral field technique in use in astronomy, and the main sub- 
ject of this chapter, involves optics which reformat a contiguous two-dimensional 
field into a format suitable to feed a classical spectrograph. These all allow a 
two-dimensional detector to capture three dimensions of information by provid- 
ing a specific and known encoding of the spatial and spectral dimensions on 
the detector. The three types of reformatting optics we discuss in the following 
sections are: 


e Lenslet arrays, which focus many contiguous subsections of a two-dimensional 
field onto a grid of separated images/pupils that can be treated like a two- 
dimensional input to an imaging spectrograph; 

e Fiber bundles, which use the inherent flexibility of optical fibers to rearrange 
a two-dimensional field into a one-dimensional pseudo-slit; and 

e Image slicers, which use a stack of long thin mirrors to “slice” a two-dimensional 
field into many slitlets and rearrange them into a one-dimensional pseudo-slit. 
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Fig. 2. Schematic of the three main types of reformatting optics used in astronomical integral 


field units. Image credit: Mark Westmoquette/Jeremy Allington-Smith. (See electronic edition for 
a color version of this figure.) 


Slicer 


Figure 2 shows a conceptual schematic of how each of the three types reformats 
the on-sky field to the spectrograph input, and what the spectrograph output (on 
a two-dimensional detector) looks like. Software is used to reconstruct the detector 
data I(x,y) into a three-dimensional datacube as (RA, Dec, ). 


2.2. Other Integral Field Techniques 


In addition to these reformatting optics, other techniques can be used to produce a 
three-dimensional datacube, though in general they do not acquire the data simul- 
taneously. These are described in brief below, but they do not form a major part of 
ground-based integral field spectroscopy. 


2.2.1. Fabry—Pérot Imagers (Tunable Filters) 


A Fabry—Pérot interferometer consists of two parallel glass plates held at a small 
separation (tens of micrometers). The plates create an interference cavity through 
which only certain wavelengths are transmitted, effectively forming a narrowband 
filter. By modifying the separation of the plates, the transmission wavelength can 
be tuned. Hence taking a series of images with different tuned wavelengths provides 
a three-dimensional dataset. Tunable filters* have particular strengths in providing 
a very high number of spatial pixels (limited only by the detector size; 2k and 4k 


*Tunable filters are covered in more detail in Volume 3, Chapter 16. 
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detectors being common). Depending on the design, very high spectral resolutions 
are possible. Their obvious weakness is the sequential nature of the data acquisition, 
which limits them in practice to rather short spectral scans. The transmission of 
the plates is also a function of angle, meaning that the central wavelength of each 
image varies across the field of view. 


2.2.2. “Push Broom” Spectroscopy 


“Push Broom” techniques utilize a moving source/scanning telescope to build up 
two-dimensional information from a one-dimensional detector. By extension, scan- 
ning a classical slit spectrograph across a source can provide a three-dimensional 
dataset. This technique is not very efficient for typical low flux astronomical appli- 
cations, as each spatial point is observed in turn. Given that slits are, by definition, 
much longer than they are wide, building up anything approaching a square field of 
view implies a large investment of telescope time. This in turn means there is a sig- 
nificant risk of a change in observing conditions affecting the (spatial) homogeneity 
of the datacube. 

Scanning techniques are more attractive when short observations can be taken 
(i.e. a bright target), the target of interest is much larger than the instru- 
ment/telescope’s intrinsic field of view, and observing conditions are stable. They 
are therefore particularly attractive for space-based observations of very large fields, 
such as Earth observation or in-orbit scans of other planets. 


2.2.3. Imaging Fourier Transform Spectrograph 


Fourier Transform Spectrographs have proven a very flexible instrument concept for 
obtaining accurate spectra in both astronomy and elsewhere. The extension to field 
coverage — an imaging FTS — has been proposed for many years, and was a popular 
early instrument concept? for JWST (née NGST). The theoretical efficiency benefits 
of an iFTS are significant, at least in a faint background, low read-noise limit. They 
offer the potential for wide wavelength coverage, high spectral resolution, wide field 
of view, and simultaneous deep imaging. They rely in effect on scanning a dual-port 
Michelson interferometer, which is technically challenging (but increasingly feasible) 
and can struggle in the low contrast regime. Several iF TS instruments have been 
deployed over the past few decades, and the technique is better discussed in several 


review articles. 1° !? 


2.2.4. Energy-Sensitive Detectors 


Energy-sensitive detectors (e.g. MKIDS devices”) can be considered as an integral 
field technique. Each pixel effectively records a spectrum by recording the energy 
(wavelength) of each photon hitting it. This technique has been used for decades 


>See Volume 2, Chapter 2. 
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in X-ray astronomy® but is an emerging technology at optical wavelengths, and 
array sizes are currently limited to a few thousand spatial pixels and low spectral 
resolutions (R ~ 10—100). They will no doubt become a very important technology 
in integral field spectroscopy over the next decade, particularly for applications such 
as the direct spectroscopy of exoplanets. 


3. Lenslet Array IFUs 


Lenslet IFUs are particularly suited to the case where Nspatial is large, and Ngpectral 
is relatively small (ie. many short spectra). They are therefore well suited to appli- 
cations needing very low resolution spectra (e.g. exoplanet direct detection), or 
which focus on very specific wavelength ranges (e.g. a specific emission line). The 
packing of the spectra of the detector array is not as efficient as slicers or fiber IFUs, 
but the inherent format of the lenslet array is typically well matched to covering a 
detector. Lenslet IFUs also have the advantage that physically close spaxels in the 
field follow very similar paths through the spectrograph optics (see Fig. 3). This 
tends to lead to smoothly varying changes in optical performance across the field of 


Fig. 3. A zoom of the central portion of the SPHERE IFS science detector observing a bright star. 
Many short individual spectra are seen (dispersed vertically), but the underlying image shape (in 
this case an apodized PSF with a core and surrounding halo) is maintained on the science detector. 
From Ref. 13, used with permission. 


“See Volume 4, Part 2. 
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view, which is not necessarily the case with reformatting IFUs such as image slicers. 
They offer a compact and efficient instrument design, with minimal few additional 
elements in the optical path. 

Lenslet IFUs work by imaging the focal plane with an array of microlenses 
(typically a few hundred microns in diameter). The lenslet array forms a grid of 
“micropupils” (pupils) at the input of the spectrograph, which are in turn dispersed 
to form an array of short spectra on the science detector (Fig. 3). A characteristic 
feature of lenslet IFU data is that the cardinal axes of the lenslet array are rotated 
slightly with respect to the dispersion direction, allowing maximum packing effi- 
ciency by interleaving spectra on the detector (see the top panel in Fig. 2). Still, 
the length of the spectrum is necessarily curtailed using bandpass filters to avoid 
overlap with adjacent field points. To cover a wide spectral range and resolutions, 
general purpose lenslet IFS often employ a large number of bandpass filters (e.g. 
IRIS*). 

Due to their small size and high density, the array of lenses is usually formed 
in a single optical element, hence the term “lenslet array”. The size (D) and focal 
length (f) of the lenslets are chosen to produce ppupils which are well separated 
from each other, and to provide the required sampling of the field. Typically the 
physical size of the lenslets (and the detector) requires the lenslets to be fed with 
magnifying pre-optics, which produces a much slower f/ratio (FR = f/D) than the 
telescope itself. The simple relational equations below describe the key parameters 
of a lenslet IFU design. 

The lenslets image the telescope pupil, and hence the size of the pupils at the 
spectrograph input is given by 


=a 
fienslet fienslet FRin ut 
Dupupil = ( D elescope = 67m Tan a oe . (1) 


finput 


To efficiently pack the pwpupils on the detector image, the separation of lenslet 
centers (in effect the lenslet diameter) must be N x Dypupit, where N is typically 
45.4 Hence with Eq. (1): 


Dyenslet N fienslet F'Rinput = 
——_ } = 0.27| — — — : 2 
( mm ) (F) (42) ( 15 ?) 


The size of the lenslets, along with the input f/ratio used to feed the array, defines 
the sampling of the field of view: 


sampling Dyenslet F'Rinput a Dei - 
——_—_ | = 13.8 —— ‘ (3) 
arcsec/lenslet mm 15 meter 


The assumption here is that the spectrograph produces a good image of the ppupil, and that the 
geometric size of the pupil dominates the size of the image of the detector. In reality the image 
quality of the camera must also be taken into consideration, and can change these equations. 
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As with all instrument design however, the fundamental ruling equation is 
étendue conservation (conservation of AQ). The spectrograph camera must map 
the entire telescope pupil down to a few pixels, and this in turn limits the field each 
spatial element can contain. To satisfy the Shannon sampling criteria,!4 the pupil 
(Eq. (1)) must be sampled by >2 detector pixels).° Considering this, the size of 
the pupils the lenslets should generate is related to the camera f/ratio (FR) and 
detector pixel size by 


A a3 ; 
Dupupil > 60 Mspec pixel size . (4) 
vm 0.5 15 wm 
where Mspec is the magnification ratio of the spectrograph. Using the fact that 
Mspec = FReam/F Reon, and that FReou = F Rienslet; we then obtain 


=r ae ‘ 
Dypupil = 67 F’Rienslet FReamera pixel SIZC : (5) 
pm. 4.5 2.0 15 wm 
Equating Eq. (1) and Eq. (5), and including Eq. (3) resolves the maximum spaxel 
size/lenslet diameter as 


sampling 34" pixel size \ ( FReamera Delescope me (6) 
arcsec/lenslet } 15 ym 2 lm 

This is so fundamental that the parameters of the lenslet itself drop out! The 

goal of lenslet IF'S design then becomes balancing Diensict, FRienslet, and FRinput to 


achieve the desired sampling (both on the sky, and on the detector) given the (effec- 
tively) fixed constraints of pixel size, Dtelescope, and practical limits on FReamera 


and lenslet manufacturing. Other aspects of the spectrograph design must also 
be considered, such as spectra length near the edge of the detector, or disperser 
dependent zeroth-order shifts of the spectra. The interested party is referred to 
Ref. 15, who provide a very nice overview of the basic design considerations of a 
lenslet IFS. 


3.1. Pupil Imaging Spectroscopy 


One frequently ignored benefit of lenslet arrays is that they create an array of 
pupils as the input to the spectrograph. As the position of the pupil is, by defini- 
tion, the same for all field points, this effectively decouples the spatial structure of 
the source from the spatial structure of the spectrograph input; the latter is effec- 
tively turned into wavelength information by the spectrograph. Lenslets therefore 
offer the possibility for very stable wavelength mapping, which is independent of any 
sub-lenslet spatial structure within the source. One implication of this, however, is 


“One subtlety is that the imaged size is the pupil geometric size x PSF of the spectrograph, which 
can significantly affect these equations if the PSF > geometric size. 
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the need for very accurate matching of observation and calibration pupils, as changes 
in pupil position or shape directly affect the wavelength calibration. 


3.2. Lenslets in the Diffraction Limit 


The original lenslet IFS (the TIGER instrument!) was designed for seeing-limited 
observations, where the lenslets are substantially larger than the diffraction limit of 
the telescope. In this case, pure geometric optics works well to describe the lenslet 
performance (as per the equations above). As lenslets move to smaller spatial scales, 
however, diffractive effects become increasingly important; the pupils stop being 
clean pupils and become Airy disks. In this case, a simple lenslet array can perform 
poorly, introducing substantial scattered light between spaxels. To address this, 
Ref. 16 developed the “Bigre” concept! for the SPHERE instrument on VLT. This 
is similar to the original “TIGER” lenslet concept, but uses a double-sided lenslet 
array to create a de-magnified image of the lenslet rather than a mini-pupil, which 
minimizes the coupling between lenslets. The importance of this effect depends on 
the down-stream spectrograph, and for example the GPI instrument!” has success- 
fully implemented a TIGER-type lenslet IFU in the diffraction limit. 


3.3. Lenslet IFU in Instruments 


In this section, we review/preview some instruments based around lenslet IFUs. 
This is not an exhaustive list, but is instead designed to give the reader some flavor 
of how lenslets are used. 


3.3.1. TIGER/OASIS/SAURON/OSIRIS 


The TIGER instrument!’ developed for the CFHT in the mid-1990s was the first 
lenslet-based integral field spectrograph, providing CCD-based spectra (0.35-1.0 
pum) over a 10” x 10” field of view. The TIGER prototype spawned two follow-ups: 
OASIS, specifically targeting AO observations, and SAURON,'® aimed at maximiz- 
ing field. The latter achieved a field of up to 41 x 33 arcseconds and made significant 
contributions to our understanding of the dynamics of early-type galaxies. 

In the near-infrared, the OSIRIS instrument!’ is also based around the TIGER 
principle, and provides the 10 m Keck-II with diffraction-limited integral field spec- 
troscopy. The pre-optics in OSIRIS include a cold stop, and the optics of the instru- 
ment are cooled to 55K to minimize thermal background in the K-band. OSIRIS 
includes an interesting option to mask a section of the field, enabling the astronomer 
to trade spaxels for a longer spectra should the science case prefer it. 


fLike the lenslet IFU concept itself, both “TIGER” and “Bigre” are of French origin; one having 
the rather tortuous backronym of “Traitement Intégral des Galaxies par |’Etude de leurs Raies” 
and the other an exasperation of shock at an unexpected effect! 
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3.3.2. Planet Hunting: GPI/SPHERE 


In the mid-2000s, the search for extra-solar planets intensified, and AO-fed lenslet 
spectrographs offered the ideal instrument for direct detection. As mentioned ear- 
lier, the observations require many short spectra covering the field of interest 
(size/sampling ~ 4”/0.02” = 200 x 200 spaxels), but only sampling the low- 
resolution (molecule dominated, R ~ 20) spectra of the exoplanets. Two instru- 
ments, with most effort focused on very high-performance adaptive optics systems, 
employ lenslets as their primary “science instruments”: GPI'’ on Gemini, and 
SPHERE-IFS!° on VLT. Both have proven the power of lenslet IFS to provide 
very high contrast observations near bright stars, though exoplanet detections have 
not been as forthcoming as hoped! 


3.3.3. Future Instruments 


The IRIS instrument* being constructed for TMT uses a lenslet IFU to provide inte- 
gral field spectroscopy in its two finest sampling scales (4 mas and 9 mas/spaxel). 
This provides a relatively large field of view of 0.5 x 0.5 or 1.0 x 1.1 arcseconds 
on limited detector real estate (relative to the telescope size at least!). Similar to 
OSIRIS on Keck, the instrument provides a capability to trade field of view for spec- 
tral coverage by covering up part of the IFU. An elegant optical design allows the 
IRIS spectrograph to work efficiently for both the lenslet and slicer (see Sec. 5.4.5) 
modes of the instrument, and the design also provides for the large number of filters 
and grating settings required to support a general purpose lenslet IFU. 


4. Fiber Bundle IFUs 


The basic concept behind fiber IFUs is simple to understand: a close-packed bundle 
of optical fibers is placed in a focal plane, each fiber therefore receiving light from 
a small part of the image. The other end of fibers are arranged to form a linear slit 
at the entrance of a spectrograph, and hence a spectrum of each point in the input 
focal plane can be obtained via a classical “slit” spectrograph. 

One huge advantage of fiber IFUs is their literal flexibility. This allows them 
to be relatively easily integrated (or even retrofitted) into general purpose long-slit 
or multi-object spectrographs. The input and output focal planes of the IFU can 
be located very close to each other, making them optically “transparent” to the 
spectrograph. This has been utilized on several general purpose spectrographs such 
as GMOS,?° VIMOS,?! and IMACS.”2 

The physics of optical fibers is covered elsewhere in this Handbook®, and all the 
same issues apply to fiber IF Us. In particular, matching numerical apertures and 


See Chapter 6 of Volume 2. 


hThe “numerical aperture” (NA) of a fiber is approvimated as NA = nsin(@max) = \ /N2 5-6 — nage 
where n is the index of refraction around the fiber and @max is the maximum angle, relative to the 
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managing fiber geometry (bending) to minimize the effects of focal ratio degradation 
(FRD) can prove a challenge. Equally, the alignment of the fibers to the focal plane 
is crucial to maintain throughput, and the fibers must be perpendicular to the focal 
plane to ~1 degree. Experience-based overviews?’ of building fiber integral field 
units outline some of the pitfalls which lie therein. 

Fiber IFUs have fallen out of favor somewhat in dedicated single-field IFU 
instruments, as lenslet arrays and image slicers typically offer better through- 
put, packing efficiency, and easier scaling to very high spaxel counts. Fiber IFUs 
are, however, the ideal technology for instruments exploiting a multiplex gain, 
either for very wide fields (e.g. VIRUS), or resolved multi-object spectroscopy (e.g. 
MANGA/SAMI). They also allow for a more considered input field geometry than 
lenslets or slicers (which strongly favor single monolithic fields); many fiber IFU 
instruments include “sky fibers” from areas surrounding the main science field of 
the IFU, allowing a simultaneous sampling of the sky background around the object. 
This is particularly useful in the case of extended objects, which may fill the entire 
science field. 


4.1. Fiber IFU Coupling 


Coupling light into, and out of, a fiber bundle is probably the most challenging 
part of a fiber IFU. Given the physical limitations on fiber sizes (typically 100- 
1000 microns, for a multimodal fiber), the possible spatial sampling and f/ratio 
combinations are somewhat restricted. For example, matching a typical 0.22NA fiber 
to the full beam of a 4 m-telescope corresponds to a plate scale of 22” /mm, requiring 
a 25-50 ym fiber to sample typical site seeing. More typically though, instruments 
utilize fibers of ~100 wm core diameter fed at F/3-F/6. This under-filling the fiber 
almost inevitably leads to some FRD loss, which must be accommodated in the 
spectrograph design to retain maximum efficiency.' 


4.1.1. Bare Fiber 


The simplest fiber IF Us use an array of optical fibers placed in the (reimaged) focal 
plane of the telescope. Fibers are mounted, usually by hand, in a block, and sealed 
with an adhesive. Once set, the entire block is polished to produce a contiguous 
focal plane filled with individual fibers. This is relatively easy, and can maximize 
the throughput of the individual fibers. Well polished and anti-reflection-coated 
fibers can have an efficiency of >95%. It does have disadvantages though in terms 
of the limited packing fraction which can be achieved. 

In theory, circles can be packed into a hexagonal array with an efficiency of 
nearly 91%. In a fiber array however, one must also account for the (inactive) 


axis of the fiber, at which light can be coupled into the fiber. The exact acceptance angle depends 
on details of the fiber and distribution of modes which can be supported. 
iSee also Chapter 15 of this Volume. 
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cladding layer around the (light carrying) core. The cladding layer diameter is typ- 
ically ~10% larger than the core, meaning the active area of the fiber is only ~81% 
of its total area. A close-packed array of fibers can therefore only achieve a packing 
fraction of about 73%. So, while individual fibers are very efficient, taking this 
packing efficiency into consideration means fiber IFUs have a relatively low total 
throughput compared to image slicers and lenslet arrays. They do, however, have 
advantages for instruments focused on large area/surface brightness science, as the 
fibers can be mapped to large areas on the sky and the overall “grasp” of the 
instrument can be high even if the packing efficiency is lower (e.g. VIRUS). 

Care is needed in packing fibers into an array to avoid discontinuities and gaps 
in the field, and to maintain good alignment of the fibers with the focal plane. 
An alternative solution is to mount the fibers in a pre-fabricated structure (e.g. a 
matrix of rigid tubes, or a metal block with a regular grid of holes drilled in it). 
This ensures that the fiber geometry is well controlled, but the support structure 
means the packing fraction is even lower. Operationally, the packing fraction can 
be recovered by combining several exposures with the telescope offset by the fiber 
spacing. Combining the exposures later gives a contiguous datacube, but of course 
with the risk of some loss of homogeneity across spatial elements. 


4.1.2. Hexabundles 


An interesting new technology for fiber IFUs developed over the past few years 
is “hexabundles”.?* These are in effect an integrated multi-fiber bundle, and are 
described in more detail in Volume 2 of this Handbook. 


4.1.3. Lenslet Array Coupling 


The most common method for coupling light into an optical fiber IFU is to use a 
(micro)lenslet array. The lenslet array is placed in the focal plane, and each lenslet 
produces an image of the pupil onto a fiber behind. This removes the need for the 
fibers to be close packed, and often fibers are mounted in a machined “hole grid” 
matched to the lenslet array geometry. In effect, this is an extension of the lenslet 
IFU concept, with each wpupil mapped into a fiber rather than imaged directly by 
the spectrograph. Figure 4 shows a clear illustrative example, though most fiber 
IFUs use monolithic microlenslet arrays. As the lenslet arrays are (typically) mono- 
lithic blocks, the geometry is well controlled during manufacture and leads to a 
very regular sampling on the sky. Using square or hexagonal lenslet arrays can give 
packing fractions approaching 100%. 

More importantly, the lenslets also allow the f/ratio of the telescope to be 
matched to the optimal numerical aperture of the fiber, minimizing the effects of 
FRD and tuning the sampling to the science cases. From a pure fiber IFU point 


JSee Chapter 6 of Volume 2. 
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Fig. 4. The CYCLOPS IFU lenslet-fiber couple optics; (left) an array of 15 lenslets mounted on 
a substrate, co-aligned with (right) 15 fibers in the focal plane of the lenslets. From Ref. 25, used 
with permission. 


of view, the numerical aperture of the lenslet must be less than the fiber (typically 
0.22NA, equivalent to ~F/2.3), and the focal length must be short enough to make 
the reimaged telescope pupil smaller than the fiber diameter. The equations covering 
this are essentially the same as those described in the lenslet array section (Sec. 3). 

These equations give the basic trades in designing a fiber IFU. For a typical 
instrument sampling the seeing limit on a 4-8 m telescope, we require lenslets 
with diameters 0.1-1 mm to match fibers in the 50-200 micron range. The main 
trade is between micro (monolithic manufactured arrays with, typically, Diens < 
500 microns) and macroarrays (assembled from individual lenses with, typically, 
Diens > 1-2mm). The latter solution has some benefits for FRD (as the cartoon in 
Fig. 5 shows), and lower scattered light (due to better lenslet quailty); but requires 
larger fore-optics and more effort in assembly (to individually align fibers to lenses). 
It would become unfeasible for large arrays. 

It must be recognized however that the fiber size and input f/ratio are inti- 
mately linked to the design of the spectrograph the IFU is feeding. In particular, it 
is likely to be limited by the practicable fastest f/ratio of the spectrograph camera 
and the spectrograph magnification required to efficiently map the fiber cores onto 
detector pixels. Indeed these considerations often lead on large telescopes to an 
array of individually small fibers being used for spectroscopy of even point sources 
(see Sec. 4.3.4). In addition, most designs do not drive the fibers to the maximum 
numerical aperture, and usually make some allowance for FRD in the spectrograph 
design (e.g. the SPIRAL instrument?® fed the fibers at f/5 and designed the spec- 
trograph for f/4.8). 


4.2. Spectrograph Input Geometry 


The output ends of the fibers form the input slit of the spectrograph. Fibers are 
stacked in a linear array, bonded, and polished to form a single smooth optical 
surface. The fibers cannot be closely packed in the slit, as each fiber must be cleanly 
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Fig. 5. Cartoon image showing the effect on apparent f/ratio entering a fiber depending on the 
relative size of fiber and lenslet. From Ref. 26, used with permission. (See electronic edition for a 
color version of this figure.) 


extracted in software to minimize cross-talk between spatial elements in the IFU. 
Typically fibers are spaced 2-3 diameters apart. This spacing requirement gives a 
relatively inefficient packing on the detector (one spatial element requires at least 
4-6 detector pixels). 

In more advanced instruments, the fiber alignment in the slit is controlled to 
match the optical design of the spectrograph; for example, arranging all the fibers 
on a curve to match the focal plane/pupil of a (reverse) Schmidt collimator. 


4.3. Instruments Using Fiber IFUs 


Fiber-based IFUs have been deployed as a specific mode in many general purpose 
instruments over the past two decades (though their history can be traced back 
to the late 1970s?”), as well as forming the core of several specific integral field 
spectrographs. Fiber-based IFUs have also emerged as a key enabling technology for 
planned instrument on the 30 m + generation of “extremely large telescopes”. The 
list of instruments discussed here is by no means exhaustive, but hopefully provides 
the reader with an overview of the development of this area of instrumentation. 
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4.3.1. Fiber IFUs in Multipurpose Instruments 


As mentioned before, the flexible nature of fibers makes them well suited to 
incorporating in the wide field imaging spectrographs developed for the 8m 
class of telescopes. The GMOS instruments on the Gemini telescopes incorporate 
1000-element hexagonal lenslet-coupled fiber IFUs.?° These are held in a deployable 
carriage that mimics one of the focal-plane masks used for multi-object spectroscopy 
on GMOS. When inserted into the beam, the IFU picks off two fields (one for the 
object, one for sky background) and reformats them to two linear slits in the same 
focal plane. In this sense, the IFU mode is transparent to the rest of the instrument. 
The IMACS instrument?” on Magellan employs a very similar design of IFU, though 
in this case the output is arranged into a long curved slit to match the field curva- 
ture of the spectrograph. Similarly, the VIMOS instrument?! on VLT incorporated a 
6400-element IFU feeding fibers into multi-object spectrographs, though in this case 
the coupling lenslet array was made of two sets of crossed cylindrical microlenses. 
Also on VLT, the FLAMES multi-fiber instrument?® provides both a large single 
central IFU (22 x 14 fibers) and multiple deployable IFUs (15 times 6 x 4 fibers) in 
addition to the more common single fibers. 

The IFU modes on these “workhorse” observatory instruments have done much 
to bring integral field spectroscopy to the mainstream astronomy community. 


4.3.2. Dedicated Single Object Fiber IF Us 


Many experimental instruments contributed to the development of “common user” 
fiber IFUs, including the DensePak instruments?” °° at KPNO, SILFID*! on CFHT 
and HEXAFLEX*? on the WHT. These were generally “bare fiber” IFUs, and cou- 
pled to existing spectrographs as concept demonstrators. The SPIRAL, COSHI,?° 
and CIRPASS instruments were all built around macrolenslet arrays (typically 4 mm 
diameter), with up to 499 elements in the IFU. These instruments toured a host of 
4-8 m telescopes in late 1990s/early 2000s. 

The PMAS instrument? on the 3.5 m Calar Alto telescope provides a well- 
sampled field of 16 x 16 spaxels via a lenslet-coupled IFU and an under-sampled 
wide-field bare-fiber IFU (PPak**). The goal of the latter is the same as SparsePak?° 
on the 4m WIYN telescope — to provide maximal light grasp for large extended 
objects. 

The VIRUS*® instrument, recently deployed on the 9.2m Hobby-Eberly Tele- 
scope (HET), has taken this concept to an extreme. The instrument, specifically 
designed for the widest field spectroscopic survey science, comprises 156 cloned 
spectrographs, each fed by a ~250 element fiber IFU head. The fibers allow the 
spectrographs to be mounted efficiently on the telescope structure out of the light 
path — something that would be impossible with other IFU techniques. Each IFU 
head is separated from its neighbors by ~80% of its size, meaning four pointings 
are required to reconstruct a single contiguous field of 20 x 20 arcminutes. The bare 
fibers within each IFU head are not themselves closely packed, and hence require 
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3-4 small dithers to completely sample the field. In total, to fully survey the 20 x 20 
arcminute field requires ~16 exposures. This may seem like a compromise, but in 
fact is a fantastic example of designing an instrument to achieve the science goals — 
in this case the widest possible field of view for blind spectroscopic surveys. 


4.3.3. Fiber-Based (Deployable) Multi-Object IFUs 


Fiber-based IFUs have been used very successfully in multiobject spectrographs. 
Being ostensibly the same as a single fiber, they work very well in fiber “pick- 
and-place” systems. Instruments such as GIRAFFE,?® SAMI,°” WEAVE,?® and 
MaNGA”*® all utilize deployable fiber IFUs.* 


4.3.4. Fiber IFUs on Large Telescopes 


The plate scales delivered by large telescopes (typically <1 arcsecond/mm for 30 
m-class telescopes) mean it is very hard to get seeing-limited images into a single 
fiber. Even using the maximum f/ratio requires a ~300-500 micron fiber to match 
the seeing disk. At the output end, it would be optically impossible to reimage such 
a large fiber down onto a few detector pixels. The result would be a significant 
oversampling of the spectrum, giving lower sensitivity and requiring a much larger 
instrument and detector to achieve the desired spectral resolution. Instead, by using 
a small (10’s of fibers) IFU for the point source, the fiber size, spectrograph optics, 
and detector pixel size can be much better matched. Using fiber IFUs to “slice” star 
images is arguably their most important role in the future. They will allow efficient 
high resolution spectroscopy of point sources on large telescopes — fulfilling exactly 
the same function which drove development of the first image slicers!*° The inher- 
ent flexibility of small fiber IFU bundles makes them ideal for incorporating into 
a multi-object spectrograph. The MOSAIC and HIRES concepts currently under 
study for E-ELT both use fiber-IFU units, as does the proposed GMT-MANIFEST 
instrument.*+ 


5. Image Slicer IFUs 


Image slicers use a stack of long thin mirrors in a focal plane to transform a two- 
dimensional field into a linear “psuedo-slit” to feed a spectrograph. Since their first 
use as an integral field unit in the mid-1990s, they have grown in popularity and are 
central to the first generation of spectroscopic instruments on 30 m-class telescopes. 
They tend to be well suited to obtaining long spectra over a relatively small but 
well sampled field of view. 

The concept of an image slicer dates back to the 1930s,4° though the original 
use was to reduce spectrograph slit losses for point sources without compromising 


kSee also Chapter 15 in this Volume. 
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spectral resolution, rather than to provide any sort of spatial resolution on extended 
sources. The concept of designing an image slicer that would preserve the spatial 
information was first introduced in the MPE near-infrared imaging spectrograph 
“3D” .42 The motivation for the development of a new IFU concept was linked to 
the low performance of fibers at near-infrared wavelengths and their unsuitability 
for use in cryogenic environments. This “3D” concept has been widely adopted and 
adapted for a host of efficient spectrographs at both optical and infrared wavelength 
ranges, e.g. SINFONI,*? UIST,*4 GNIRS,*° MUSE,*® HARMONI,° and others. 

Image-slicer-based IFUs offer a number of advantages over other types of IFUs. 
In particular, the reformatting of the slitlets into the pseudo-slit also allows image 
slicers to make efficient use of detector real estate (in comparison to lenslet or fiber 
IFUs), with up to 95% of pixels actively used (see Fig. 6). The fact that sampling 
in one direction (along the slice) does not happen until the detector means that 
spaxel-to-spaxel variations can be very well controlled. The caveat to this is that in 
the other direction (across the slice), adjacent spaxels can (depending on the design) 
take rather different paths through the spectrograph optics. As they produce a long 
“pseudo-slit”, they are naturally combined with a classical long slit spectrograph 
design to produce long homogeneous spectra. There is a limitation to this technique: 
the reformatted slit length drives the size of the spectrograph optics and the image 
quality must be preserved along the slit. Slicer IFUs are particularly sensitive to 
variations of image quality along the slit, due to the way the on-sky field maps into 
the detector focal plane. 


Fig. 6. A raw detector image from a slicer-based IFS (the “SWIFT” instrument in this case), 
illustrating how the three-dimensional data are rendered on the two-dimensional detector. In this 
case, wavelength runs right-to-left, and the spatial direction up-down over 20 slices, each 90 pixels 
long. Bright (H-alpha) emission is seen across the whole field on the right-hand side of the image, 
whereas the continuum of the star is seen towards the top of all slices, and most strongly in the 
slices at the top. The gap in the middle is the overscan region of the 2-amp detector readout. 
Credit: Fraser Clarke, University of Oxford. 
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5.1. Optical Designs 


Image slicers work by — quite literally — slicing the image. An image of the focal 
plane is projected, usually at a quite slow f/ratio, onto a stack of long thin mirrors 
(typically with aspect ratios up to 50—-100:1, and often known as “slicer mirrors” ). 
Each of these slicer mirrors is tilted at a small angle to its neighbors, such that 
different “heights” in the input field are reflected at different angles. Each of these 
beams is collected by a second mirror (often referred to as the “pupil mirror”), 
angled such that all the beams share a common exit pupil. 

The spread of tilts on the slicer stack, which inclines them to the focal plane, 
forces the input beam to be relatively slow to avoid significant defocus. Usually 
some fore-optics is needed to produce this input beam, and many slicer instruments 
use these fore-optics to act as a “scale changer” to provide different spatial sampling 
on the image slicer. 

Figure 7 shows the 44-element slicer stack from the “SWIFT” spectrograph*” 
deployed on the Palomar 200-inch telescope. This slicer had a 44 x 88-element field 
of view, made of 44 long thin slices optically contacted into a solid block. Each slice 
has a different angle to spread out the field. 

The desire to ensure that adjacent slices are close to each other on the detector 
(to maximize packing efficiency), together with the need to capture light from adja- 
cent beams away from the focal plane (i.e. when the beam is defocused), leads to a 


Fig. 7. An image of the 44-element slicer stack of the “SWIFT” image slicer spectrograph 
deployed on the Palomar 200-inch telescope. Each “slice” is 0.5 mm thick and 22 mm long. Credit: 
Fraser Clarke, University of Oxford. 
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Fig. 8. Cartoon illustrating the origin of the “brickwall” slit stagger often seen in image slicer 
IFUs. The illustration shows “lens” as per that specific instrument design, but the same point is 
valid for any pupil/folding mirrors outside the focal plane. Credit: Matthias Tecza, University of 
Oxford. 


characteristic “brick-wall” pattern for the output slits. Figure 8 illustrates how this 
effect arises, and the “odd-even” effect of the slice tilts is also visible in Fig. 7, This 
maximizes spatial packaging, at the expense of a reduction in common-wavelength 
of the IFU. Where wavelength coverage is particularly important (for example on 
a small detector), it is possible to remove this spectral stagger, but at the expense 
of offset pupils between slices (which in turn increases the required spectrograph 
grating /camera apertures). 

In the simplest case, the image slicer is a 1:1 system that “unfolds” the 2d field 
into a ld pseudo-slit. However, for large fields (i.e. many slices) this results in either 
a physically long slit (making spectrograph manufacture hard), or a physically very 
small image slicer (making IFU manufacture hard). Instead, most modern slicers 
introduce some form of demagnification between their input (slicer) and output 
(slit) focal planes. There are several methods of achieving this demagnification, but 
the most common is to give the slicing mirrors some optical power.*® As they are in 
the focal plane, they do not image the field, but produce an image of the telescope 
pupil ~ fsticer away from the slicer. The power and position of the “pupil” mirrors is 
then set to produce (a) a demagnified image of the slicer mirror, and (b) a telecentric 
exit pupil common to all slices. These fall out easily from the thin lens equation: 


Separation = felicer + fpupil = 8; (7) 
Demagnification = = SS Jsticer (8) 
s fpupil fpupil 


Equation (7) ensures a telecentric output, and Eq. (8) gives the desired demagnifi- 
cation. Slicer design then becomes (a rather nontrivial) issue of setting a geometry 
to give a close packed output slit while clearing all beams. 
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Fig. 9. The SWIFT image slicer assembly, showing the central slicer stack, the two sets of flat 
“pupil” mirrors, and the two arrays of pupil lenses. This design uses lenses to provide demagni- 
fication, keeping the reflective surfaces flat. All components (except lenses) were manufactured 
in Zerodur and optically contacted together. A Euro, UK Pound, and US Quarter give a scale. 
Credit: Fraser Clarke, University of Oxford. 


Figure 9 shows the whole “SWIFT” IFU demonstrating all of these principles: 
the image slicer stack (Fig. 7) in the middle, two arrays of angled folding mirrors 
forming a crescent, and the brick-wall-patterned pupil lenses at the exit. This IFU 
provided a demagnification of ~7.5x. 


5.2. Anamorphic Magnification and Sampling 


A difference between image slicers and other IFU techniques is where the field is 
spatially sampled. In an image slicer, the field is sampled in different directions at 
different locations; i.e. sampled across the slices (“X”) in the slicer focal plane, and 
along the slices (“Y”) in the detector focal plane. The latter point complicates mat- 
ters, as the detector is also used to sample in the spectral direction. To adequately 
sample the spectral line spread function, the monochromatic image of the slit (in 
“X”) must be imaged to at least two detector pixels. This in turn means that in 
the spatial (“Y”) direction, a detector pixel corresponds to only half the width of 
the slit. The spatial sampling is therefore different (~2:1) across and along the slice 
direction. There are several potential solutions to this: 


(1) anamorphically magnify the beam 2:1 across the slices; 
(2) generate 2:1 rectangular detector pixels along the slices (i.e. by binning); 
(3) accept rectangular sampling on the sky (i.e. rectangular spaxels). 
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If it is optically possible, option (1) is preferable, as it maximizes the IFU field of 
view for a given detector area. The downside is that it requires the spectrograph 
optics to work twice as hard (i.e. be twice as fast) in the spatial direction as the 
spectral one. The anamorphic magnification itself can be introduced before the 
image slicer, after the slicer, or indeed in the image slicer itself by generating toroidal 
slices. 


5.3. Manufacturing Techniques 


Image slicers are complex optical elements, and their design and manufacture are 
often a significant fraction of the overall instrument cost. The feasibility of an image 
slicer design is usually set by manufacturing considerations, and there are two main 
manufacturing options available: polished or machined. 


5.3.1. Polished/Glass Slicers 


Glass-based image slicers are used at shorter wavelengths (i.e. visible light), where 
surface roughness becomes a dominant factor in determining the quality of an optic. 
These slicers are made by accurately polishing a large set of individual optics to 
precise (but different) angles (effectively prisms), and then assembling them together 
to produce the slicer. This is obviously a labor intensive and skilled process, and 
tends to be relatively expensive. Low thermal expansion material (e.g. Zerodur) is 
typically used for polished IFUs, as it both polishes well, and allows good optical 
contacting. 

In recent years, the MUSE and KCWI image slicers have been polished in 
glass (Zerodur), and the HARMONI and IRIS instruments for ELT and TMT, 
respectively, are based around polished glass image slicers. 


5.3.2. Turned/Metal Slicers 


At longer wavelengths, where the surface roughness of micromachining is more than 
adequate for optical surfaces, precision machining can be used to manufacture image 
slicers in aluminum. In this case, the slicer is generally machined out of a solid 
block, rather than being assembled post-facto. The use of metal optics for cryogenic 
instruments is often preferred, as it permits the construction of the complete instru- 
ment from a single material (aluminum), removing issues related to differential 
contraction of the optical elements and support structures on cooling. Once the 
alignment of the optical components has been achieved at room temperature, it 
is preserved on cooling to cryogenic temperatures. In the case of complex optical 
systems such as IFUs, the advantages are significant. Depending on the techniques 
used, machining also offers more possibilities for introducing complex optical shapes 
into the slicer mirrors, which can open up some novel optical design possibilities. 
Gold coating of the machined aluminum increases the infrared transmission. 
Machined slicers have been used for many years in instruments such as 
UIST,*4 NIFS, GNIRS*° and KMOS.”* The latter, in particular, demonstrated the 
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advantages of this technique for batch production, with 24 identical image slicers 
being produced. Both the MIRI and NIRSPEC instruments®”*! aboard JWST 
contain machined image slicers, as will the METIS instrument®? on ELT. 


5.4. Image Slicers in Instruments 


In this section, we review/preview some instruments based around image slicers. 
This is not an exhaustive list, but is instead designed to give the reader some flavor 
of how image slicers are used. 


5.4.1. MPE8D/SPIFFI-SINFONI 


The first infrared image slicing integral field spectrograph, MPE-3D,*? was based 
on a Zerodur slicer manufactured from individual flat slices coupled to flat pupil 
mirrors. The slices were 8 mm long x 0.4 mm wide, individually made and optically 
contacted. The 16 x 16 spaxel slicer was at ambient and the reformatted slit then 
became the input to a cryogenic spectrograph. 

SPIFFI*? maintained the basic MPE-3D concept, using flat mirrors for both 
input slicing and output pupil mirrors; this time providing a 32 x 32 spaxel field of 
view. The SPIFFI image slicer is fully cryogenic, and provides excellent transmission 
and scattered light performance even down to 0.8 micrometer wavelengths. SPIFFI 
has been the heart of the SINFONI instrument*? on the ESO VLT since 2004, and 
will continue as part of the upgraded ERIS instrument®* long into the 2020s. 


5.4.2. MUSE 


The MUSE integral field spectrograph*® on the ESO VLT is the current record 
holder in terms of the number of spaxels. This highly modular instrument, made 
up of 24 individual spectrographs, covers a field of view of 1 arcminute? on the 
sky sampled by 0.2 arcsec pixels, resulting in a total of 90,000 spaxels. The MUSE 
fore-optics contains a field splitter that produces 24 slices of 60 arcsec x 2.5 arcsec. 
A field separator directs each slice to one of 24 individual integral field units where 
it is reformatted into 48 slitlets of 15 arcsecs x 0.2 arcsecs. The MUSE field splitter 
and image slicer are constructed from Zerodur, polished using traditional methods 
to a surface roughness of 2.2 nm and assembled using molecular bonding to avoid 
glue.®° 

Both the slicer and pupil mirrors in MUSE are powered to provide a demagni- 
fication of the slices by 12.9 x . This allows for a much more compact spectrograph 
design, which is particularly important given the huge field coverage of MUSE. 


5.4.3. KCWI 


The KCWI instrument*® on Keck also uses a classically polished image slicer, though 
this time with convex slices to produce a virtual entrance slit for the spectro- 
graph. In a difference to most of the image slicer instruments, KCWI provides 
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three interchangable slicer stacks to allow the astronomer to tune the sampling to 
their science. The stacks have physically different slice widths to change the spatial 
sampling in one direction. The advantage, of course, is that it avoids the need for 
any scale changing “fore-optics” before the slicer, hence maximizing instrument 
transmission. 


5.4.4. KMOS 


KMOS*" on the ESO VLT was the first multi-object slicer-based spectrograph. 
It comprises three cryogenic spectrographs, each one taking light from eight image 
slicers (with their output slits butted end-to-end). Each image slicer is in turn fed by 
a deployable pick-off arm, which can access any point in a 7-arcminute field of view. 
KMOS can therefore obtain 14 x 14 element datacubes, sampled at 0.2” /spaxel, 
of up to 24 objects in a single exposure. 

The KMOS image slicers*? are machined from blocks of aluminum and achieve 
a surface roughness of 5—10nm, ensuring good performance at the infrared wave- 
lengths used by KMOS. 


5.4.5. Future Instruments 


Image slicers are central to several instruments currently in development (or await- 
ing launch!) for the next generation of major observatories. The sections below are 
again not exhaustive. 


JWST NIRSPEC & MIRI: The NIRSPEC and MIRI instruments®”*! on 
JWST both include image slicer IFUs; indeed, MIRI includes four to accommodate 
its very wide wavelength range.°® Working in the infrared, and being cooled to very 
low temperatures, the image slicers are machined in aluminum. NIRSPEC contains 
30 x 30 element image slicer covering a 3” x 3” field of view. The MIRI image 
slicers (see Fig. 6 of Ref. 58), operating at longer wavelengths, contain fewer but 
larger spatial elements, appropriated scaled to sample the diffraction-limited PSF 
over different wavelength ranges. The MIRI slicer draws on much heritage from the 
UIST slicer developed for the UKIRT telescope in the early 2000s. 


HARMONI: The HARMONI instrument? currently being designed for the 
European ELT is based around a ~32,000 element (152 x 204 spaxel) polished glass 
image slicer. Similar to MUSE, HARMONI uses a two-stage slicing process, with 
an eight-way field splitter sending the field to eight 38-slicer image slicer stacks. 
The 1mm x 51mm mirrors in these stacks are off-axis sections of a parabola, 
and together with their pupil mirrors provide a 7.6 x demagnification of the slice. 
The resulting output slits are paired end-to-end and feed four spectrographs, each 
with a 0.5m-long input slit and producing a 4k-long spectrum of each spaxel. The 
image slicers in HARMONI contain more mirrors than the whole of the 39m ELT 
primary! 
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IRIS: The IRIS instrument* on TMT uses a slicer stack to provide the larger 
scales of its integral field spectroscopy mode. The IRIS slicer design produces a 
pair of 4k-long slits offset in wavelength, which maximizes field (8k spaxels) at the 
expense of wavelength coverage (2k) on a single 4k infrared detector. To maximize 
field in the smaller scales, IRIS switches to a lenslet design, as described in Sec. 3.3.3. 


6. Operating and Calibrating IFU Spectrographs 


IFU spectrographs are in essence a combination of imagers and spectrographs. As 
such they inherit many of the operational and calibration requirements/features of 
classical instruments. The combination, however, introduces some specific issues, 
which are addressed here. 


6.1. Calibration 


The calibration of IFU data follows the same basic steps as normal spectroscopy 
data. Some additional features of IFUs do, however, require additional calibration 
effort. In particular, specific geometric calibrations must be obtained to enable the 
integral field to be reconstructed from the on-detector data. This usually takes the 
form of a known geometric pattern projected onto the IFU, which enables software 
image reconstruction parameters to be adjusted to recover the known shape from the 
observed data. For fiber bundle IFUs, a calibration often made during manufacture 
is to back-illuminate individual fibers in the exit slit, and measure exactly where in 
two-dimensional space they appear in the entrance bundle. 

The push to pack IFU data as efficiently as possible onto the detector can lead 
to problems of cross-talk between adjacent spaxels. For fiber and lenslet IF Us, the 
“influence function” of one spaxel is often measured by illuminating a single spaxel 
and measuring how much light spills into the neighboring spaxels. This informa- 
tion can then be used to deconvolve to some extent signal from adjacent spaxels. 
Cross-talk is less of an issue for slicer-based systems, as the field is not sampled in 
one direction (along the slices) until the detector. None-the-less, cross-talk between 
adjacent slices can be problematic if insufficient gap is left, and the relatively com- 
plex optics of a slicer can make it prone to ghost images if not carefully designed. 

While conceptually similar to “normal” spectrographs, the detailed calibration 
of IFUs can hold some pitfalls. In particular, the spatial imaging nature of IFUs 
makes them rather sensitive to calibration quality; they suffer from all the calibra- 
tion complexities of spectrographs, and all the calibration sensitivities of imagers! 
Effects that would be unnoticed in multi-object spectrographs (e.g. slight fiber-to- 
fiber transmission variations) become all too apparent when the spaxels cover a 
contiguous field. 


6.2. Operation 


IFUs can provide some simplifications for operations, offering to some extent “point 
and shoot” spectroscopy without the need to align the target onto a specific slit. 
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The relative benefits of this of course depend on (1) the field of view of the IFU, 
(2) the size of the target, and (3) the inherent pointing accuracy of the telescope. An 
additional complexity, however, comes from the “encoded” nature of the raw data 
frames; it is not necessarily easy to understand the pointing on the target from raw 
science data. In most cases, some form of fast reconstruction software is employed 
to allow the astronomer to generate an “on-sky” image to help acquisition. 


6.3. Data Formats and Representation 


The most common form of representing data from an integral field instrument is as 
a three-dimensional datacube. In a datacube, the X- and Y-axes typically represent 
spatial coordinates (often right ascension and declination), while the Z-axis repre- 
sents wavelength. Commonly in astronomy, the datacube is stored as a FITS file, 
which allows it to be viewed with a range of standard tools (e.g. SAOImage/ds9 
or qfitsview). A major disadvantage of this technique, however, is the need to 
interpolate data onto a homogeneous grid, which inevitably leads to some reduction 
in the data quality. 

A more attractive approach is to store the data as a “cloud”, with each pixel 
on the detector assigned its own three-dimensional coordinate. This preserves the 
data in its purest form, but requires as yet nonstandard software for analysis and 
viewing. 
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Multi-object spectrographs are the instrument of choice for astronomical surveys. 
In this chapter, we highlight the development of these wide-field spectrographs, 
covering the most important techniques used to enable the selection of multiple 
targets and the principles of the design and operation of such instruments. 


1. Introduction 


Multi-object spectrographs (MOS) greatly increase the observing efficiency of 
ground-based and space telescopes by simultaneously targeting tens or hundreds 
of astronomical objects. The earliest example of multi-object spectroscopy is the 
use of objective prisms coupled with photographic plates to produce a very low 
spectral resolution image of all objects in the field. Examples of catalogs and papers 
produced from these spectra include quasar surveys with the 1.2 m UK Schmidt 
catalogs,! the Hamburg/ESO survey with the ESO Schmidt telescope? and Har- 
vard College Observatory plate library.? Short spectra from bright, well-separated 
objects could be extracted.* The use of photographic plates at major observatories 
has now largely died out due to the availability of large format electronic detec- 
tors. The technique of slitless prism spectroscopy lives on in the Swift/UVOT® and 
HST/STIS® instruments, and is one of the observing modes of the James Webb 
Space Telescope NIRCAM instrument.” 

The basic concept of the current generation of multi-object spectrographs con- 
sists of a selection mechanism, located at or near the focal plane of the telescope, 
reconfigurable to select the astronomical targets of interest in the spectrograph 
field of view. The beams from each selected object then form the input aperture of 
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Scene on the sky Scene on the shutter mask Selection of Objects Spectra on the detector 


Fig. 1. The basic principles of multi-object spectroscopy, based on the use of a shutter mask 
to create slits. Image Credit: Astrium GmoH (https://commons.wikimedia.org/wiki/File:Basic_ 
principle_of_Multi-Object_Spectroscopy.png) [License: CC BY-SA 3.0 (https: //creativecommons. 
org/licenses/by-sa/3.0)]. 


a spectrograph. In Fig. 1, the MOS concept used in the NIRSPEC spectrograph is 
shown: a shutter mask is configured to produce three slits, the light from which is 
dispersed in the spectrograph. 

The modern era of MOS spectroscopy started in the 1970s with the realization 
that optical fibers could be used to couple the light from multiple sources into 
an astronomical spectrograph. Today a variety of different techniques for object 
selection have been developed to meet the evolving scientific goals of the astronomy 
community. In this chapter, general design considerations for multi-object spectro- 
graph systems are presented. Various techniques for the selection of objects that are 
suitable to different scientific goals and instrument concepts are then summarized 
and a few notable implementations are described. 


2. Design Considerations 


There are a few points that must be considered during the design of a multi-object 
spectrograph, irrespective of the selection mechanism chosen. Given the basic asser- 
tion that maximizing the survey speed (approximately: multiplex factor x field 
size x telescope diameter) will be one of the principle drivers of the MOS design, 
these instruments are naturally wide-field instruments. Thus, the spectrograph will 
be physically large, leading to challenges in the opto-mechanical design. Table 1 
gives the approximate physical size of the focal planes of some representative spec- 
trographs on different telescopes, to provide a general idea of the scale of these 
instruments. The control and data analysis software required to position selection 
mechanisms and then to extract astronomical information from hundreds of sources 
is also not without difficulty. In addition, there are field-dependent atmospheric 
effects to be taken into account. A short summary of these considerations is given 
here. In the sections describing different systems, examples of the ways in which 
these issues have been addressed are summarized. 


Spectrograph 


6dF 

2dF 
WEAVE 
MOONS 
DEIMOS 


Telescope 
(diameter) 


UK Schmidt (1.2 m) 


AAT (3.9 m) 


WHT (4.2 m) 


VLT (8.2 m) 


Keck II (10.0 m) 


Table 1. 


Focal plane 
used 


Prime focus 

Prime focus 

Prime focus 
Nasmyth 
Nasmyth 


Spectrograph size scales. 


f-ratio 


f/25 
f/3.3 
f/25 
f/15 
f/15 


Plate 
scale 


67.12 arcsecs/mm 
16.2 arcsecs/mm 
17.8 arcsecs/mm 
1.72 arcsecs/mm 
1.38 arcsecs/mm 


Patrol field 
(angular scale) 


6° dia. 
9° 
2° dia. 
25’ dia. 
16.7’ x 5! 


Patrol field 
(mm at the focal plane) 


320mm dia. 
444mm dia. 
404mm dia. 
873.0mm dia. 
726.4mm x 217.5mm 


sydo.ibo.ioadg 12a69QQ -24)N VV 
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2.1. Properties of the Focal Plane 


The characteristics of the focal plane of the telescope are an important input to the 
design of any MOS spectrograph. The ideal telescope for an MOS would deliver a 
focal plane that is flat over the wide field of the spectrograph and for which the reim- 
aged pupil is telecentric.* The telecentric condition is hard to achieve, but a design 
that is pupil-centric (the focal plane radius of curvature is the same as the exit pupil 
position) provides a good compromise for coupling the input beams to, for example, 
optical fibers. Early on in the development of fiber spectrographs, the importance 
of achieving a telecentric field in order to minimize the losses from focal ratio degra- 
dation (FRD, discussed in Sec. 3.1) was identified.2 Now telescopes are routinely 
adapted using specific field-correcting optics to meet the needs of their wide-field 
spectrographs. Figure 2 shows the optical layout for the wide-field corrector (WFC) 
that will be inserted in the VISTA telescope. The 4MOST WFC, combined with the 
VISTA primary and secondary mirrors, will provide a 2.5° pupil-centric field with 
the image quality correction needed to couple the light from the astronomical sources 
into the 85um diameter fibers used in the 4MOST spectrograph.? The 4MOST WFC 
also includes an Atmospheric Dispersion Corrector (ADC, see Sec. 2.2) designed to 
operate over the 390-950 nm wavelength range that is the goal for the spectrograph. 
The lenses marked L1—L3 in Fig. 2 are made from N-BK7 glass: L1, at almost 890 
mm diameter and 85 mm thick, pushes lens manufacturing to its current limits. 
This is a common challenge for the latest generation of wide-field spectrographs 
that seek to exploit ever wider fields on ever larger telescopes. 

The 4MOST WFC will be installed at the Cassegrain focus of the VISTA tele- 
scope. In contrast, the WFC for the DESI spectrograph (Fig. 3) will be installed at 
the prime focus of the 4-m KPNO Mayall telescope.!° This 6-lens WFC will also 


MNT To 


CAT TT) 


Patnle 


Fig. 2. Optical design of the wide-field corrector for 4MOST on VISTA. 


*Telecentric means that the pupil of an optical system is at infinity. 
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Focal Plane Seer 
Calibration Lamp 
Assembly w/ — System 
5000 fiber “= 
ero New Upper Ring, 
New 6-lens wide- Spider, Cage 
field Corrector on 


Hexapod 


Ten thermally- 

controlled 3-channel 
Spectrographs 
360-980 nm 


Fig. 3. The KPNO Mayall telescope layout incorporating the DESI spectrograph systems.!+ 


incorporate atmospheric dispersion compensation and will deliver a 8° field-of-view 
to a 5000 fiber positioner system that will also be located on the top-end ring of 
the telescope. 


2.2. Atmospheric Differential Refraction 


Understanding the impact of atmospheric differential refraction is always important 
for wide-field instruments and those observing over a wide wavelength range, but 
particularly so for multi-object spectroscopy. Failure to account for these effects in 
the design and operational model for the instrument can result in loss of sensitivity 
and photometric errors. The first detailed analysis of the impact of these effects 
on astronomical spectroscopy was presented by Ref. 12, and they have been subse- 
quently analyzed for specific spectrographs.!* 1° The differential refraction between 
d and 5000 A is 


R(A) — R(5000) & 206265[n(A) — n(5000)] tan z, (1) 


where n(A) is the refractive index of air for the specific temperature, pressure and 
water vapor pressure of the atmosphere and z is the zenith angle of the observa- 
tions.!* The differential atmospheric effects have a two-fold impact on the image 
at the telescope focal plane. The first effect is that the image is dispersed in wave- 
length, leading to a blurring of the image point spread function (PSF) and, for 
spectrographs, a chromatic loss of flux, the impact of which depends on the relative 
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size of the blurred PSF and the input aperture. This chromatic effect is referred 
to as atmospheric dispersion. At airmass = 2 (z = 60 degrees) the atmospheric 
differential refraction between 4000 A and 5000 A is >1 arcsecond (comparable 
to the atmospheric seeing at a good site). The effect is reduced at infrared wave- 
lengths: between 9000 A and 10,000 A it is ~0.1 arcseconds for a high altitude 
telescope sight. This is negligible relative to the atmospheric seeing, but significant 
for instruments that use adaptive optics on large telescopes (e.g. the diffraction 
limit of a 10m telescope at 10,000 A is 0.025 arcseconds). In a single slit spec- 
trograph, this effect may be compensated by aligning a spectrograph slit along 
the direction of the atmospheric dispersion (the parallactic angle). In this way, 
all the flux from a point-source enters the slit, albeit spread over several detector 
pixels. (It is worth noting that if the slit angle on sky is set to the parallactic 
angle then one of the advantages of long slit spectroscopy — the ability to align 
the slit along an extended object — is compromised.) For an MOS spectrograph, 
it is not possible to align all the slitlets in the field to the parallactic angle due to 
the second effect, field differential refraction, described below. Since atmospheric 
dispersion is constant with field size/airmass, it can in principal be corrected by 
applying an equal and opposite dispersion using prisms inserted in the optical path 
before of the aperture.” The impact of the additional optical surfaces (in terms of 
loss of sensitivity) must be balanced against the operational advantages and the 
reduction in the slit losses when deciding whether or not to include an ADC in the 
spectrograph. 

The second effect that can have an impact on wide-field instruments is the 
field differential refraction: the small change in airmass across the wide patrol field 
results in the center of point-source images being offset on the focal plane relative 
to their nominal position. Furthermore, these positions are continually changing 
as the airmass of the field changes. For observations lasting a significant length of 
time (hours) this change in apparent position of the objects may move the target 
out of the selection aperture. This effect is achromatic but, unlike the atmospheric 
dispersion, cannot be corrected simultaneously for every point in the field. Field 
differential atmospheric refraction is normally compensated by careful planning of 
the operation of the instrument. For a given aperture size relative to the expected 
image size, the acceptable loss of flux can be specified based on the observing pro- 
gram. Taking the requested duration of observations as the input, the airmass at 
which the observations can be carried out with this acceptable photometric error 
can be calculated. A detailed description of the specific example of VIMOS on the 
VLT was presented recently.!° Depending on the selection mechanism employed in 
the spectrograph, it may also be possible to adjust the pick-off to the new apparent 
position of the objects during the observations. 


>The design of optical atmospheric dispersion compensators (ADCs) is described in Chapter 5 of 
Volume 2. 
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2.3. Target Acquisition 


Placing the object to be observed on the aperture of the instrument is a neces- 
sary step for an astronomical instrument, which must be treated carefully for MOS 
spectrographs due to need to acquire many objects over a wide field of view. For an 
MOS, this may be considered in two parts: the position of the objects on the focal 
plane where the selection will take place must be well understood and the selection 
mechanism must be accurately placed on the object positions. One example that 
illustrates the typical accuracy required is from the requirements of the FMOS 
spectrograph on the Subaru telescope. FMOS"” uses fibers that are 1.2 arcseconds 
in diameter to select the objects. The physical size of these fibers is 100 zm at the 
Subaru prime focus and the requirement is that each fiber is to be positioned within 
20 xm of the target position!® to keep the aperture losses at <10%. In Sec. 2.2, the 
impact of the atmosphere on the object position was described. Additional contribu- 
tions to the placement of the object on the focal plane come from the properties of 
the telescope focal plane, principally optical distortion. An excellent mapping from 
(RA, Dec) on the sky to mm in the focal plane is required. The static offsets can 
be calibrated, for example by observing a plate of pinholes at precisely known posi- 
tions. Preimaging of the fields to be observed with the same instrument is another 
calibration technique commonly used in slit-mask MOS instruments, although it 
is more expensive in terms of telescope time. The preimage is then used in the 
manufacturing of the slit mask. This has the additional advantage of also removing 
any uncertainties in the precise position of the target object, but of course does 
not remove the field-dependent atmospheric refraction effects unless the image is 
obtained within the same range of airmass as the spectroscopic observations. Once 
the ideal position of the pick-off mechanism has been calculated, it may also be 
necessary to confirm that the pick-offs have reached their position using a feedback 
loop, and, if necessary, to iterate towards the position. A metrology camera can be 
used to image the focal plane, either simultaneously or by scanning, or measurement 
of the position with a device such as an encoder can be used. 


2.4. Sky Subtraction 


In astronomy, at infrared wavelengths in particular, the magnitude of the tar- 
get objects can be significantly less than the background radiation from the sky 
and telescope. Even for spectroscopic observations, calibration by subtracting this 
background signal from the signal from the science target (called sky subtraction) 
is required. Sky subtraction methods for slit spectrographs include either beam- 
switching between an object and sky position (normally “nodding” the object along 
the slit to avoid the factor of two loss in observing efficiency) or measuring and 
subtracting the sky from areas of the slit not illuminated by the object. For fiber 
spectrographs, sky subtraction has typically been carried out by removing the signal 
measured in a fiber that does not contain an object. However, mismatches in the 
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transmission down each fiber and differences in scattered light background have 
been one of the limiting factors in the magnitude of objects observable. Alterna- 
tive methods for improving sky subtraction have been explored recently, including 
beam-switching between fibers!® and the use of more complex algorithms using 2D 
information (e.g. for LAMOST ?°). Using these techniques, coupled with modern 
fiber materials and good instrument design, fiber spectrographs can deliver sky 
subtracted spectra with <1% residual sky emission. 


3. Spectrograph Concepts 


3.1. Spectrographs Using Optical Fibers 


Optical fibers developed for the telecommunications industry were quickly adopted 
by the astronomy community when their potential for transmitting the light from 
large numbers of astronomical objects over large distances was realized.?!:?? The 
first systems were developed for astronomy at visible wavelengths; now, increasingly, 
fibers are being used for applications for which infrared observations are required. 
Fibers offer many options for spectrograph development and to address different 
science cases. Very large numbers of fibers can be deployed, fibers can be clus- 
tered together to function as integral-field units° for 3D spectroscopy (see Sec. 3.1.1 
discussion on SAMI), and long fiber lengths can be used while maintaining good 
transmission. The latter property allows the spectrographs to be located far from the 
pick-off mechanism, simplifying the instrument design by placing the spectrographs 
in mechanically stable locations. 

Alongside this adaptability, the use of fibers also introduces some constraints 
in the design of the system. Soon after the introduction of fibers for astronomy, 
it was realized that one of the most important factors affecting performance is a 
phenomenon referred to as Focal Ratio Degradation (FRD): the f-ratio of the input 
beam is increased at the output of the fiber due to a range of imperfections in the 
fiber (e.g. bending, stresses, refractive index changes along the fiber).?* 7° Light 
entering a fiber is scattered and emerges in an annulus of finite width at the fiber 
output. The difference in the output beam is referred to as focal ratio degradation, 
approximately: 


AO _ déiber 
ie) i. Bye 


(2) 


where Rpena is the bending radius of the fiber and dgper is the fiber core diameter 
(Fig. 4). 

This failure to conserve étendue (or AQ, where A is the area of an aperture 
and Q is the solid angle of the input beam) results in either a loss of light, a loss 


“See also Chapter 14 of this Volume. 
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Fig. 4. A schematic showing the concept of focal ratio degradation, after Ref. 24. 


of spectral resolution, or a reduction in the sensitivity of the instrument. Since the 
FRD is lower for fast f-ratios, care must be taken to inject the fiber with a fast 
f-ratio (normally in the range from f/2 to f/6). 7° 

An important subset of MOS spectrograph instrument concepts, which has been 
enabled by the use of fibers, is that in which the instrument and telescope are a fully 
integrated system. Such a system typically consists of a spectrograph mounted at 
the telescope prime focus (e.g. 2dF at the AAO) or a fiber pick-off system mounted 
at the prime focus with long fibers linked to a spectrograph located elsewhere in 
the dome (e.g. the SUBARU Prime Focus Spectrographs,?” WEAVE?® and the 
LAMOST spectrographs?? described elsewhere in this chapter). Schmidt telescopes 
that have previously been used for photographic plate surveys have proven ideal 
for conversion to carrying out large spectroscopic surveys, thanks to their large 
(typically degrees) fields of view and fast focal ratios. The focal ratios of large 
telescopes at the Cassegrain or Nasmyth focus are in the slower-than-recommended 
injection range for FRD (see Table 1 for examples); the prime focus is often within 
this range, making this one of the reasons that prime focus spectrographs using 
fiber pick-offs are very effective. 


3.1.1. Focal-Plane Plates 


The earliest MOS spectrographs using optical fibers used physical plates (“plug- 
plates”) mounted at the telescope focal plane into which fibers were manually 
inserted. Examples of these systems were MEDUSA”? (37 fibers) at Steward Obser- 
vatory, FOCAP?! (>100 fibers) at the AAO, and FLAIR®? (92 fibers) at the UK 
Schmidt telescope.*? These systems demonstrated that multi-object spectroscopy 
with fibers was feasible, but that the manual placement system was a limiting factor 
due to the effort required, the slow time (4-5 hours) to configure a new plate for a 
different astronomical field and the damage to the fibers. The robotic placement of 
fibers has become the norm, although plug-plates still have a role in development 
of new prototype systems. 
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The elements of the robotic fiber placement mechanism are robotic arm(s) that 
patrol the field in radius, r, and azimuthal angle, 0, with the fibers held in grip- 
ping mechanisms; a tensioning mechanism for the fibers; and a control system that 
calculates the optimal allocation of fibers to sources and the placement of fibers in 
the focal plane without collisions or tangling. The fibers are positioned in series, 
leading to a comparatively long time (1-2 hours) to configure a plate that would 
typically hold a few hundred fibers. An instrument system using this technology 
will therefore often include two or more focal-plane plates. One plate is configured 
while the second plate is used for observations, then moved into place as needed, 
reducing the downtime between observations to typically a few minutes (see Table 2 
for examples). 

The 3.9 m Australian Astronomical (at the time, Anglo-Australian) Observa- 
tory telescope was home to one of the first large spectroscopic survey instruments, 
2dF. Named for the two-degree diameter field of view, 2dF** used a pick-and-place 
optical fiber pick-off system to feed two spectrographs located at the prime focus of 
the AAO (Fig. 5). 2dF handled up to 400 fibers, with a configuration time per fiber 
of 6s. The fibers were attached magnetically to a steel plate. The novel feature of 
two back-to-back focal-plane plates allowed one field to be configured during obser- 
vations with the second, leading to extremely efficient survey operations** requiring 
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Fig. 5. This schematic view of the 2dF top-end shows the main features of the system that 
are located at the telescope prime focus including the spectrographs on the outer ring, the fiber 
positioner, tumbler to exchange focal plates and the corrector lenses. Image credit: Greg Smith, 
David Malin, and André Porteners, ©AAO, used with permission (http://magnum.anu.edu.au/ 
~TDFgg/Public/Pics/2dFtopend.jpg). 


Multi-Object Spectrographs 295 


only three minutes to exchange plates.°° The facility included a correcting lens 
system to correct for atmospheric dispersion and to provide a flat and telecentric 
field.34 2dF executed one of the first galaxy surveys, obtaining spectra of 245,591 
galaxies and redshifts for 221,414 of these, providing a direct measurement of the 
large-scale structure of the Universe.*® 

The design concepts that were used for 2dF were exploited in the survey spectro- 
graph, 6dF,°° that succeeded the FLAIR spectrograph. Located at the prime focus 
of the 1.2 m UK Schmidt telescope, 150 fibers were positioned by an (r,0) robot. 
A survey of near-infrared selected galaxies was carried out, resulting in 138,000 
new galaxy redshifts.?” Subsequently, the instrument was used for a detailed stellar 
radial velocity survey, RAVE, and collected 570,000 spectra over 10 years.?° Another 
example of such a spectrograph is the WEAVE instrument being developed for the 
William Herschel Telescope. Two robot positioners are used to position 960 fibers. 
Figure €? shows the complexity of the placement of the fibers. To compensate for 
the differences in required fiber length depending on the field position of the target 
object, the additional fiber length is wound around fiber retractors (Fig. 7), which 
also keep the fiber under mild and constant tension when they are deployed. 


Zenith distance = 31.3 degrees, Hour Angle = 0.0 hours 


Fig. 6. WEAVE focal-plane schematic. 


4From https: //ingconfluence.ing.iac.es:8444/confluence//display/WEAV. 
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Fig. 7. WEAVE fiber retractors. 


Focal-plane plates have also been used to realize integral field spectroscopy 
with fibers. The use of fiber bundles to provide observations of an object over a 
contiguous area (of a few to ~10 arcseconds) removes observational biases inherent 
in single fiber observations of extended sources.?? The use of fiber bundles as IFUs 
can readily be combined with the concept of multi-object spectroscopy to allow 
spatially resolved observations of tens of objects simultaneously (e.g. FLAMES,?° 
SAMI*!). The SAMI spectrograph uses 13 “hexabundles” with high filling factor 
(close-packing of the fibers) to carry out a survey of galaxies. Each hexabundle is 
made up of 61 fibers for a total field of view of 15 arcsecond diameter (1.6 arcseconds 
per fiber core) with 75% filling factor.° The fiber bundles are positioned on a manual 
plug-plate with just 30 minute downtime between 2 hour exposures achieved with 
this prototype spectrograph. The distribution of target galaxies over the very wide 
field of view (6 degrees) allowed for positions for up to four sets of holes to be 
drilled in a single plate for four different sets of galaxies. Twice as many hexabundles 
(26) were allocated to empty sky positions to provide excellent sky subtraction to 
facilitate the observations of the very faint galaxies. The positions in the plug- 
plates are set for (RA,Dec), converted to (a, y) using atmospheric refraction and 
the optical distortion. The SAMI software includes an additional feedback from the 
operator showing the projected position of the object at the airmass of the proposed 
start of the observation compared to the position of the plug-plate. 

An emerging technology for the fiber placement mechanism is the use of micro- 
autonomous robots.4?4 The most advanced of these concepts are the “starbugs” 4” 
robots that “walk” across the focal-plane plate of the spectrograph using a piezo- 
electric mechanism (Fig. 8). Each starbug robot carries a fiber which selects the 


“See also Chapter 6 of Volume 2. 
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Fig. 8. A starbug with a fiber payload. 


object. Fibers in focal-plane plates such as those described above are normally 
removed completely from the field of view before reconfiguring the instrument for a 
new astronomical field. In contrast, the starbugs walk from their current deployed 
position to the new position, requiring complex control software that optimizes the 
path for each robot between one position and another, avoiding collisions.** The aim 
is to reduce the reconfiguration time to around 5 minutes while retaining accuracy 
at the level of micrometers; laboratory tests have demonstrated that the individ- 
ual starbugs can reach their target position within a tolerance of 5 zm in 30s.*° 
The starbugs concept was first tested on-sky at AAO with the TAIPAN spectro- 
graph with 150 starbugs; the full complement will eventually include 300 starbugs.*° 
TAIPAN has a metrology camera viewing through the plate to provide positional 
feedback for the location of the starbugs and six larger field-of-view guide starbugs 
to assist with object acquisition. TAIPAN is a prototype for the MANIFEST fiber 
positioner being designed for the 25.4 m Giant Magellan Telescope.*® 


3.1.2. Fibers Patrolling a Small Field 


Another family of fiber positioning systems populates the full field of view with 
fibers, each of which is positioned within a smaller patrol field by some actu- 
ator (small motors, or piezoelectric actuators). The first example of this is the 
ECHIDNA*’ positioner developed for FMOS. ECHIDNA is located at the prime 
focus of the Subaru telescope. The fibers are held in stiff “spines” and are located 
within their patrol field using piezoelectric actuators. This class of positioner offers 
less flexibility, in terms of allocation of objects to fibers, than the pick-and-place 
robot, but has the advantage of speed and comparative simplicity of the control 
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Fig. 9. A schematic of the ECHIDNA fiber positioner showing the long “spines” holding optical- 
fibers fill the focal-plane surface and are driven by the piezoelectric actuators to the locations of 
the target sources.°? 


system. A diagram of the ECHIDNA positioner is shown in Fig. 9. The second 
generation of this technology is being developed for the 4MOST spectrograph. 4% 
This optical MOS will be installed on the VISTA survey telescope, replacing the 
infrared survey camera.*? Over a five year period of continuous operation on this 
dedicated 4m telescope, 4MOST is expected to carry out surveys ranging from a 
survey of the Milky Way Halo at both low and high spectral resolution to a cos- 
mological redshift survey. To address this wide scientific range, the spectrograph 
has two spectral resolutions (R ~ 20,000 and R ~ 5,000). The fiber positioner 
system, AESOP, will increase the number of fiber spines to 2436 and is expected 
to deliver reconfiguration times of 15 s for errors of <5 microns RMS in the fiber 
positioning.” 

Other spectrographs in the same family are the SUBARU Prime Focus Spectro- 
graph?’ (PFS), MOONS *! and DESI.!° The details of the fiber positioner actuators 
differ, but the basic principle is the same. A schematic of the fiber positioner for 
the MOONS spectrograph is shown in Fig. 10. To locate the fibers accurately in the 
telescope focal plane, each individual fiber positioning unit (on the left in Fig. 10) 
has a small metrology target mounted next to the fiber that can then be viewed 
by one of the twelve metrology cameras that image the field and can determine 
the position of each of the 1001 fibers to <15 yzm.°! The SUBARU PFS will deploy 
almost 2400 fibers over a 1.3° field of view using a variation of this technology called 
COBRA.*? The fibers are driven by two piezoelectric rotary squiggle motors which 
iterate to the final position using closed-loop feedback from a metrology camera to 
achieve a positional accuracy requirement of <10 pm.?? 

Finally, the LAMOST telescope also uses a variant of this technology. LAMOST 
is a Schmidt telescope whose design was optimized to provide a survey at visible 
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Fig. 10. The optical fibers of the MOONS focal plane unit are positioned in using two stepper 
motors per fiber (left). Individual fiber units are mounted into a focal plane plate for mechanical 
support (right).>1 


wavelengths. The primary and secondary mirrors of the telescope are segmented; 
the focal plane is covered with 4000 fibers that can be individually positioned using 
two stepper motors driving two rotations. The motors are controlled using wire- 
less technology, to avoid the issue of managing 8000 wires. An accuracy of 40 um 
(0.4 arcsecond) has been demonstrated with this system, over the 5-degree focal 
plane of the telescope. 


3.2. Slit Mask Spectrographs 


The most common alternative to the use of fibers is to isolate the individual objects 
in the field using a mask at the focal plane with slits located at positions tailored to 
the astronomical field to be observed. Examples of spectrographs using this selec- 
tion technique at large telescopes include GMOS °° on the Gemini telescopes and 
DEIMOS”™ (Fig. 11) on Keck IT. A slit mask MOS is often selected for observations 
of faint sources, especially in the near-infrared, as areas of the slit that are not filled 
by the object can be used to measure the sky signal to be subtracted from the object. 
The accuracy of sky subtraction obtained using this technique (~1%) is typically 
considered to be better than that obtained by fiber spectrographs (although see 
also the references in Sec. 2.4). 

The operation of a slit mask MOS requires the design and manufacture of the set 
of custom masks to be used for a given observing program. The constraints on the 
relative position and number of slits that can be manufactured are set by the length 
of the spectrum on the detector in the spectral direction and by the length of the 
slit in the spatial direction. MOS spectrographs come with software designed to aid 
the preparation of the mask and the selection of objects from an astronomical image 
or catalog. The production of the masks is relative straightforward. The position 
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Fig. 11. A schematic of the DEIMOS spectrograph showing the main opto-mechanical compo- 
nents and the overall scale of the instrument. 


and slit width accuracy required on the current generation of telescopes (8-10 m) 
can be achieved by a standard milling machine (e.g. DEIMOS shown in Fig. 11) or 
laser cutting machine (VIMOS slits of 100-300 4m width are cut by laser, achieving 
edge quality of <2um°°). The operational practice of preimaging of the fields to be 
observed with the same instrument along the same optical path to remove systematic 
errors due to optical distortion in the large and wide-field spectrograph optics was 
already mentioned above (see Sec. 2.3). To allow efficient night time operations, most 
slit-mask MOS instruments include the ability to stack at least the full set of masks 
required for a night of observing into an exchange mechanism, to avoid manual 
exchange of masks during the night in the telescope dome. Automatic exchange of 
masks is relatively straightforward in optical MOS spectrographs such as GMOS 
(up to 18 masks), DEIMOS (11 masks in a cassette that are formed to match the 
curvature of the focal plane and are 28 inches long®*) and VIMOS (10 masks),°® but 
is significantly harder to achieve in an infrared spectrograph that requires the slit 
to be at cryogenic temperatures for optimum sensitivity. Solutions for near-infrared 
MOS spectrographs are discussed below. 

Flamingos-2 is a near-infrared MOS spectrograph that uses the slit mask 
exchange concept: nine masks are loaded into a single slit wheel for one observ- 
ing night. The slit wheel is isolated in a section of the Flamingos-2 cryostat that 
permits rapid thermal cycling so that new slit masks can be inserted and the slit 
wheel returned to the operating temperature of 110 K in a turn-around time of 
10 hours.°” 
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Fig. 12. Image of the MOSFIRE cryogenically configurable slit mask MOS. 


An alternative to the mask exchange mechanisms has been pioneered by the 
cryogenic MOSFIRE spectrograph on the Keck I telescope.®® A remotely config- 
urable mask consisting of parallel bars that can be adjusted to form 46 slits of 
configurable width and field position was designed for this instrument (Fig. 12). 
The re-configuration time for this cryogenic system is <6 minutes. A similar slit 
mask mechanism has been adopted by the EMIR spectrograph on Grantecan.°” © 

Another technical development direction for cryogenic configurable slit masks 
exploits micro-optical-electrical-mechnical systems (MOEMS) technologies of two 
classes: micro-mirror arrays and micro-shutter arrays. Micro-mirror arrays, or dig- 
ital mirror devices (DMDs), have become commercially available thanks to their 
development for projector systems. Pixels of 10-20 ym formed into arrays can be 
addressed individually and set to different tilt angles. The possibility of using these 
highly configurable devices was picked up by the astronomy community as a possible 
selection mechanism for MOS instruments. A ground-based instrument exploiting 
the micro-mirror arrays, IRMOS, was developed for the Kitt Peak National Obser- 
vatory.°! IRMOS is a cryogenic infrared instrument, offering spectroscopy from 
0.85-2.45 wm with a patrol field of 3 x 2 arcminutes at the Cassegrain focus of the 
Kitt Peak 4 m telescope. The focal plane is reimaged onto a Texas Instruments 
DMD device of 848 x 600 pixels, 17 wm square, which is held at an acceptable 
operating temperature of 240 K within the 100 K environment of this cryogenic 
instrument. The individual shutters are activated to direct the beam either to the 
spectrograph or to a dark area of the spectrograph that acts as a baffle. If all the 
pixels are “on” and the grating is removed from the beam, an image of the field can 
be obtained and used to guide the placement of the slitlets formed by the DMD. 
The flexibility to define the slit positions in real time with the DMD allows different 
sky subtraction options with IRMOS: either the slit length is set to allow a standard 
“nod along the slit” approach or two identical slit patterns can be produced and 
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Fig. 13. The NIRSPEC micro-shutter array. (a) Entire micro-shutter array. (b) Close-up through 
a microscope. Photo credits: NASA. 


the image nodded between them. IRMOS successfully demonstrated the promise 
of these devices and also highlighted the development path: first of all, an MOEM 
operating at cryogenic temperatures would further reduce the thermal background 
in the spectrograph; second, the contrast between the “on” and “off” positions 
should be improved. The IRMOS contrast is reported as ~400: bright sources or 
the bright sky background can be scattered into the slit. The compact size and 
reconfigurability of such devices makes them attractive to space-based applications 
for which heavy mechanical mechanisms are not appropriate. The use of MOEMS 
has been extended in the BATMAN spectrograph that uses a larger format MOEMS 
device (2048 x 1080 pixels) executes imaging and multi-object spectroscopy in 
parallel.®? 

For the James Webb Space Telescope, a new technology for configurable slit 
masks has been developed for the NIRSPEC spectrograph. The micro-shutter arrays 
for that instrument have improved the contrast ratio to 2000 and can be operated 
at cryogenic temperatures (35K). Figure 13 shows the micro-shutter array, which 
consists of 62,415 shutters (left), and a close-up of the pixels (right). NIRSPEC 
will be configurable to produce hundreds of slitlets with this highly flexible and 
configurable slit mask.®° 


3.3. Multi-IFU Spectrographs 


The technique of integral field spectroscopy has been combined with that of multi- 
object spectroscopy in two variants: the fiber bundles discussed above, and pick-off 
mirrors mounted on robotic arms allowing a few arcseconds squared from around 
the focal plane to be directed to fixed image-slicing integral field units (KMOS,“* 
MIRADAS®?). The KMOS multi-object near-infrared spectrograph working at the 
VLT has 24 reconfigurable pick-off arms that select the objects within a 2.8” x 2.8” 
field of view over a 7-arcminute diameter patrol field. A mirror is mounted on the tip 
of a carbon-fiber arm that can be driven in angle and radius by two stepper motors 
(Fig. 14). The optical path difference is maintained by a “trombone” arrangement 
of mirrors. The arms are located on two planes within the instrument: 12 above and 
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Fig. 14. KMOS cryogenic arms. Image credit: ESO/STFC UK Astronomy Technology Centre. 


12 below the telescope focal plane, to reduce the risk of arm collisions between arms, 
each of which can sweep out a 22-degree wedge in the focal plane. The field curvature 
of the VLT is corrected by a lens at the entrance window of the cryostat so that the 
pick-off arms patrol a flat focal plane, simplifying the mechanism design. The use 
of pick-off mirrors mounted on arms allows the opto-mechanically complex image- 
slicing IFUs used by KMOS to remain fixed relative to the spectrograph. A pupil lens 
mounted within the arm relays the telescope pupil to the cold stop within the arm; 
the image is focussed at the output of the arm, which then is the input to the integral 
field unit. For this infrared, seeing-limited instrument, compensation of atmospheric 
dispersion is not required but field differential refraction is important. The arm 
configuration is set before the start of the observations to the absolute position 
of the objects, according to the observer’s catalog. At the time of preparation of 
this configuration file, the airmass of the observations (to be carried out at some 
time in the future) is not known. At the time of execution of the observations, 
the airmass of the field center is known and the apparent position of the objects 
at the beginning and end of the observations are calculated; the arm positions are 
then adjusted to the center of these positions. In the case of KMOS, although the 
barycenter of the object moves, the blurring of the image during a 1 h observation is 
considered acceptable rather than introduce the additional complexity of tracking 
the arm position during the observation to maintain image quality. 


4. Summary of Multi-Object Spectrographs 


Table 2 provides an overview of the spectrographs discussed in this chapter. The 
ongoing importance of MOS spectroscopy for the current generation of telescopes 


Instrument 


2dF 


LAMOST 


DESI 


WEAVE28 


MEGARA®® 


FMOS!” 
MOONS?! 


MOSFIRE® 


DEIMOS®4 
4MOST?8 
TAIPAN* 


Telescope (Diameter) 


AAT (3.9 m) 


LAMOST 
(4.0 m) 


Mayall 
telescope 
(3.8 m) 

WHT (4.2 m) 


GTC (10.4 m) 


Subaru (8.2 m) 
VLT (8.2 m) 


Keck (10.0 m) 


Keck (10.0 m) 

VISTA (4.1 m) 

UK Schmidt 
(1.2 m) 


Table 2. 
Patrol field 


2° 


5° 


3° 


Pa 


3.5’ x 3.5! 


30’ diameter 
25’ diameter 


6.1’ x 6.1’ 


16.6’ x 16.6’ 
25° 
5° 


Multiplex factor 


392 fibers 


40,000 fibers 


5000 fibers 


960+ mini IFUs 


92 science fiber bundles 
plus 8 on blank sky 


400 fibers 
1001 fibers 


46 slits 


130 slits 
2436 science fibers 
160 (goal: 300) 


Summary of Multi-object spectrographs 


Wavelength range (nm) 


350-1000 


370-900 


360-980, split 
over 3 
spectrographs 

400-600 (blue 
arm), 
600-950 (red 
arm) 

360-1000 


900-1800 
650-1800 


970-2410 


410-1100 
370-950 
370-870 


Positional mechanism 


Pick-and-place fiber 
positioner at prime 
focus; two plates 

Robotic positioners 
patrolling a limited 
field of view 

Robotic positioners 
patrolling a limited 
field of view 

Pick-and-place fiber 
positioner at prime 
focus; two plates 


Robotic positioners 
patrolling a limited 
field of view 

Tilting spines 

Robotic fiber 
positioners 
patrolling a limited 
field of view 

Cryogenic 
configurable slits 

Slit mask 

Tilting spines 

Autonomous ‘starbug’ 
robots patrolling 
the full field of view 


POE 


fivswvy “¢ 


Multi-Object Spectrographs 305 


is very clear from the table. MOS instruments remain an important element of 
the instrumentation plans for the future generation of Extremely Large Telescopes. 
The Thirty Meter Telescope (TMT) includes the Wide-Field Optical Spectrograph, 
WFOS® as one of its first-light instruments. The Giant Magellan Telescope plans 
include providing the facility fiber positioner, MANIFEST,*° designed with the flex- 
ibility to feed more than one scientific instrument. The MOSAIC® instrument for 
the European Extremely Large Telescope is one of the first generation instruments. 
The current MOSAIC concept includes both a fiber MOS mode for optical and 
near-infrared observations and a multi-[FU mode. Multi-object spectrographs in all 
their varieties also have a crucial role for providing sources to new, large ground- 
based and space telescopes for follow-up at higher resolution and with greater 
sensitivity. 
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We review narrowband imaging technologies, with focus on low spectral-resolution 
systems customized for observational cosmology over a much wider viewing solid 
angle than can be addressed by integral-field spectroscopy. These filters can tune 
precisely to a given emission-line, with spectral bandpass optimized for the intrin- 
sic line width. They are more efficient for redshift-targeted surveys (e.g. high red- 
shift clusters) than existing multi-aperture spectrographs. This is particularly true 
for a Lyot filter at the prime focus of a large-aperture telescope. We emphasize the 
power of tunable “differential” imaging, an approach superior to multi-beam cam- 
eras because one optical path is used for both source and reference image. Tunable 
filters sensitively map emission-line excitation diagnostics, extinction, and local 
star formation rates at the finest spatial scales. Tunable imaging polarimetry is 
also possible, albeit over a smaller viewing angle. 


Introduction 


Telescopes bring our Universe into focus, but astronomical instruments must dis- 
sect the view. Our focal-plane machines are the first step in comprehension because 
they allow us to quantify what we see in terms of measurable parameters. While 
spectroscopy now delivers most astronomical knowledge, imaging continues to play 
an important role, particularly in wide-field applications. In this chapter, we review 
filters whose wavelength passband can be adjusted during use. We do not discuss 
hyperspectral imaging, e.g. pushbroom scanning from orbiting satellites. While there 
is strong overlap with astronomical techniques, that field has different requirements 
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and is reviewed elsewhere’ with examples given of how tunable imaging has been 
exploited in art, archaeology, precision farming, remote sensing, inspection, manu- 
facturing, forensics, military target detection, pharmacy, medical and human vision 
science. 

Most nocturnal astronomical imaging has been restricted to broad spectral 
bands (spectral resolution R ~ 5), with occasional forays to narrower bands 
(R ~ 50) to isolate spectral lines. This contrasts with other physical sciences — 
solar physics, remote sensing, underwater communications, atmospheric physics — 
where diverse imaging technologies are used. It is unclear why nocturnal imaging 
should have reached this impasse. Certainly, broadband filters are stocked items, 
available at all telescopes, while narrow filters, especially those interference filters 
large enough to cover large CCD mosaics, are made for specific observational pro- 
grams by profitable companies, so only a few are purchased for specific diagnostic 
wavelengths and/or cosmological redshifts. 

With increasing adoption of integral field feeds/image slicers for single aper- 
ture (e.g. MUSE/VLT, Keck/OSIRIS) or multi-object, wide-field spectrographs 
(e.g. AAT/SAMI, SDSS/Manga), there may be a sense that devices optimized for 
a narrow wavelength interval are passé. However, there are important reasons to 
pursue tunable filter imaging: 


e Tunable filters provide moderate R images at the limit of the telescope optics 
(assuming that atmosphere blur is corrected by an adaptive optics system) with- 
out the inflexible “spaxels” of an IFS; it is more straightforward to bin imaging 
pixels according to average seeing and adaptively to desired signal/noise. An 
integral-field spectrograph (IFS) can span at most a few thousand independent 
spatial points at once, whereas tunable filters can span millions. 

e Conventional imaging has a major limitation: filters are exposed sequentially for, 
e.g., line and continuum. Hence, any detector, instrumental, or atmospheric vari- 
ation induces systematic errors from mismatched point response functions. While 
a multi-band camera observes different bands simultaneously via dichroic beam 
splitting, it is not ideal because its optical path ultimately differs for each filter. 
In contrast, a tunable filter is a powerful alternative because band-switching can 
be synchronized with charge shuffling on the CCD,? a truly differential technique 
that minimizes systematic errors compared to conventional imaging. 

e One tunable imager, the Lyot filter, is (almost) the ultimate exploitation of the 
widest possible monochromatic angular field with a given telescope.? This is 
because spectral passbands broaden in converging (focusing) light wavefronts, 
which is unfortunate because the widest angular fields are achieved at fastest 
convergence. But remarkably, wavefronts that converge as fast as f/2 (29° open- 
ing angle) can be compensated by the Lyot’s crossed birefringent elements such 
that even a constant sub-Angstrom bandpass is possible over a one-degree field 
on the sky! Few telescopes offer much more than a wide-field imager at prime 
focus because it is so difficult to exploit the fast convergence spectroscopically. 
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We show below that the compensated Lyot tunable filter solves this problem, at 
least in principle. 


2. The Physics of Tunable Imaging 


We first summarize properties of an ideal tunable filter. A rich variety of physical 
phenomena can isolate a spectral band: absorption, scattering, diffraction, evanes- 
cence, birefringence, acousto-optics, single layer and multi-layer interference, multi- 
path interferometry, polarizability, etc. Current optical filter technologies include: 


color (panchromatic) film; 
dye-colored glass; 

hole-burning device; 

Christiansen filter; 

linear/circular variable filter; 
multi-layer dielectric filter; 
Fabry—Pérot (FP hereafter) interferometer; 
Michelson interferometer; 
acousto-optic filter; 

solid etalon filter; 

fiber etalon filter; 

solid Michelson filter; 

facet optics; 

generalized resonant grating filter; 
sub-lambda evanescent grating filter; 
volume-phase holographic (VPH) grating filter; 
Lyot-Ohman filter; 

generalized Lyot or Sole filter; 
liquid-crystal filter; 

crystal-lattice filter; 

MEMs or photonic-based filters. 


One must cast a wide net to find some of these, e.g. lattice structures in photonics 


and grating filters that are largely the domain of covert research into bank note 
security? but with variants beginning to appear in remote sensing. 


2.1. The Ideal Filter 


What is meant by a “tunable filter?” The different techniques rely ultimately on 
the interference of light wavefronts that traverse different optical paths to form the 
signal (and some noise). 

Technologies that come closest to the ideal tunable filter are the air-gap FP and 
Michelson (Fourier transform) interferometers. To understand their superiority, we 
highlight the Taurus Tunable Filter (TTF), which was the first general purpose 
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such device for nocturnal astronomy.® In that FP filter, interference arises between 
two highly reflective, movable plates. To be useful, not only must the plates be able 
to move through a large physical range while remaining precisely parallel, but they 
must also operate down to spacings of order few wavelengths, as we show. This puts 
strong limits on the degree of post-coating flatness of the moving plates. 

For interference order m, in a diffraction limited system, R = mN where N 
is the instrumental finesse. (The effective finesse is typically lower due to imper- 
fections and limitations of the system.) A prism interferes a quasi-infinite number 
of light wavefronts whereas a grating interferes a finite number set by the number 
of mechanical grooves or photographic fringes. A photon of wavelength » passes 
through the filter undamped when 


mA = 2ylcos j, (1) 


with j the off-axis angle of the incoming ray and yu the refractive index of the 
substrate spacer of thickness 1; jul is called the optical gap. Hence, dR/R = dm/m = 
dX/X. Consecutive orders are separated in wavelength by the free spectral range 
(FSR). When the refractive index of the resonant cavity is constant (e.g. air), these 
orders are evenly spaced in frequency (f = hc/AX) or energy. Later, we give an 
example of a resonant cavity in a bulk medium that is aperiodic in both frequency 
and wavelength. 

The FP and the Lyot combine fewer wavefronts, set by the instrumental finesse. 
FP finesse is set by coating reflectance, which is degraded by fabrication defects, 
particularly in the plate polish and wavelength variations of the reflective coatings.’ 
The TTF plates can be scanned over separation | = 2—15 wm, and interference order 
m = 4-40, to provide R ~ 100-1000. The FP tunable filter is ultimately limited by 
the finite coating thickness between its constituent mirrors, hence only achieves low 
interference orders (m < 5) at infrared wavelengths. 

Spectrograph throughput increases as the number of interfering wavefronts 
decreases. Because at least two wavefronts are required to interfere, Ref. 10 con- 
cludes that two-beam (e.g. Michelson) interferometers are the ultimate spectro- 
graphs. In fact, the Michelson interferometer does come closest to achieving the 
above goals, and reaches the lowest orders even at optical wavelengths. 

Here then are the characteristics of the ideal tunable filter: 


image quality, high throughput, stable wavelength and photometric performance; 
monochromatic band, widest possible angular field of view; 

tunable, identical, repeatable square profile; 

broad spectral coverage, fast response without hysteresis; 

large span in resolving power with minimal reflectance phase change at low order; 
minimal stray light scattering and strong out-of-band suppression; 

insensitive to polarization, angle of incidence and environment; 

large aperture, compact design, low power requirement; 

easy data calibration. 
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It is hard to envisage any technique that could achieve all of these, short of collect- 
ing and time-tagging each photon. To pinpoint a few key issues: some birefringent 
materials (e.g. LiNO2), which have enabled solid etalon and acousto-optic filters, 
have micro-structures that degrade image quality. There are only a few filters (fixed 
element Solc, Michelson) whose wavelength transmission profile (bandpass shape) 
can be squared off in a straightforward fashion. However, they offer only a single R 
at a specific wavelength. 


3. Outline of Key Filter Technologies 


3.1. Monolithic Filters 
3.1.1. Interference Filter 


To make an interference filter, a dielectric spacer is sandwiched between two trans- 
mitting layers (Fig. 1(i)). Substrates are common fused silica in the ultraviolet, glass 
or quartz in the optical, and water-free silica in the infrared. Between spacer and 
glass, surface coatings are deposited by hot evaporation or cold sputtering to partly 
transmit and reflect an incident ray. Each sandwich is referred to as a cavity (or 
period), and the bandpass profile is adjusted by stacking cavities. Each internally 
reflected ray shares a fixed-phase relationship with all the others. For constructive 
interference, wavelength A is transmitted only if Eq. (1) is satisfied, where 0 = 0p 
is the refracted angle within the optical spacer of the optical gap. Construction of 


Filter Etalon Wedged Etalon 


‘4 


* 7! : 
reflective reflective 
coatings coatings 

anti-reflective 


coatings 


Fig. 1. (i) Interference filter: no internal structure is shown. (ii) FP etalon. (iii) A wedged FP 
etalon avoids the problem of plates behaving as highly reflective interference filters. 
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these filters has undergone a revolution through the use of dielectric, multi-layer thin 
film coatings that are cold sputter deposited; a proper description is too involved 
for this chapter. 

From Snell’s law and Abbe’s principle, all such filters can be tuned blueward in 
wavelength slightly by an amount 


BAA = — 02/2)? (2) 


by tilting them in a collimated (flat wavefront) beam; the range is <\/50 in prac- 
tice. The bandpass profile is degraded by tilts in collimated or converging beams. 
Evidently, interference filters are poor tunable devices. 


3.2. Gap-Scanning Filters 
3.2.1. FP Filter 


These interferometers are popular in many disciplines and at many observatories 
because excellent turnkey commercial devices are relatively straightforward to cal- 
ibrate and to use. They are by no means ideal as tunable filters (broad wings on 
the bandpass profile, small monochromatic field, ghost problems), but are likely to 
remain popular at least for ground-based use. The TTF was the first such employed 
over a wide range of R. 

The air-gap etalon, or FP filter (Fig. 1(ii)), comprises two glass plates kept 
precisely parallel over a small separation wherein the inner surfaces are mirror 
coated with high reflectivity 9%. The etalon bandpass profile to a monochromatic 
source A is given by the Airy function 

os oe med 0 7 (3) 

(1 — ®?) 

Equation (1) locates the transmission peaks, which are separated by FSR = \?/21. 
The FP operates only in very slowly converging or flat incident wavefronts. When 
the etalon cavity is tuned at very small plate spacings, the Airy rings extend over 
a larger solid angle and the system operates effectively as an imaging tunable filter 
(see Fig. 2). A single order is isolated by using an interference filter, e.g. Fig. 3, and 
sometimes by operating etalons in tandem. Order separation with isolating filters 
is assumed throughout the review. 

Wavelengths can be scanned in a given order by changing j (tilt scanning), ju 
(pressure scanning), or / (gap scanning). Both tilt and pressure scanning have serious 
drawbacks that limit their dynamic range. With the advent of servo-controlled, 
capacitance micrometry, the performance of gap-scanning etalons surpasses other 
techniques; temperature and humidity variations during a scan remain issues to be 
monitored. In these devices, piezoelectric transducers undergo dimensional changes 
in an applied electric field or develop an electric field when strained mechanically. 
IC Optical Systems Ltd in the UK has demonstrated plate parallelism in FP filters 
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CCD 
Camera 
Tunable filter 
Collimator ‘ ; 


Fabry—Pérot 


Collimator 


Fig. 2. The Jacquinot principle: for a fixed optical arrangement, the size of the on-axis monochro- 
matic region is set by the spacing of the parallel reflective mirrors. The etalon cavity serves as a 
tunable filter when the plate spacing is of order a few wavelengths. 


<15 cm diameter to an accuracy of \/200 while continuously scanning up to 3 cm 
to span several adjacent orders (e.g. Fig. 4). 

Such large etalons are expensive to coat: a broadband coating has >10 layers, 
is >1 um thick, and stresses the glass substrate to induce a cylindrical variation 
in effective finesse; Ref. 7 has measured and discusses such practical limitations. 
Etalon performance is limited ultimately by the finite coating thickness between 
its mirrors, which prevents gaps | < 2 ym; therefore, etalons only achieve low 
interference orders (m < 5) at infrared wavelengths. When operated at longer than 
A ~ 1500 nm wavelengths, a further complication is that the etalon must be cooled 
and operated in a vacuum. 


3.2.2. Solid Etalon Filter 


These single cavity FP devices have a transparent piezoelectric spacer of, e.g., 
LiNOg. Bulk thickness and, to a lesser extent, refractive index are modified by 
voltage applied to both faces. Tilt and temperature fine-tune the bandpass of low 
voltage systems. High quality spacers as thin as a few hundred microns are difficult 
to fabricate, so solid etalon filters normally operate at high orders of interference, 
hence high R. The largest are 5 cm diameter, for solar imaging. 
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Fig. 3. Order-separating filters of the Rutgers University FP in the Robert Stobie Spectrograph 
on SALT.” Ideally, each passes only a single spectral order from the many generated by a given 
etalon gap. The bandpass profile of these 4-period filters is relatively flat and multiplies the Airy 
profile (Eq. (3)). Reproduced by permission of the AAS. 


Tunable Filters 317 


900 
800 


700 
600 


500 


Spectral Resolution 
Spectral Resolution 


400 


300 


400 500 600 700 800 900 400 500 600 700 800 900 
Wavelength (nm) Wavelength (nm) 


Fig. 4. R as function of wavelength for the SALT RSS/FP” low resolution etalon (5 um gap) when 
operated in tunable filter (plate separation 5 ym to give R ~ 350) or low-resolution (separation 
11 wm for R ~ 750) modes and whose gap is varied with a resolution of 0.5 nm. Rapid variation 
of R results from » dependence of the reflective coating on this etalon; this is a common issue. 
Reproduced by permission of the AAS. 


3.2.3. Fiber Etalon Filter 


An interesting variant on the application of Eq. (3) and the solid etalon filter is the 
fiber etalon filter.* Here the cavity between the reflective surfaces is a bulk medium. 
The same formula applies where the refractive index jz is no longer achromatic and 
takes the form 


aX? 
w(d) =14+%; (4) (4) 


i.e. the Sellmeier equation for a transmissive material with coefficients a; and 6; 
i = 1,2,3 typically where the wavelength is measured in vacuo. This formula is 
accurate to better than 10~° for most common materials where the Sellmeier coef- 
ficients are well calibrated. Like most photonic devices, the fiber etalon can be 
tuned thermally or with an applied strain. This technology has been exploited in 
multi-object spectrographs® but not with an imaging device to date. 


3.2.4. Michelson Filter 


The collimated beam is split in the Fourier Transform or Michelson filter.!° The 
two wavefronts then undergo different path lengths by reflecting off separate mirrors 
before being recombined by the camera optics to image onto the detector. The 
device in Fig. 5 uses only half of the available light. Reference 11 shows how a 
more involved layout recovers the rest. The output photon flux depends upon the 
path difference between the two mirrors. At zero difference, all frequencies interact 
coherently. As one mirror is moved, each input wavelength generates a series of 
transmission maxima/minima. 

The filter is scanned at constant speed or stepped in equal increments from zero 
path difference (2 = y = 0) to y = L, set by twice the maximum mirror spacing 
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Fig. 5. A simplified two-beam Michelson (Fourier Transform) interferometer. 


(« = L/2). The superposition of two coherent wavefronts of amplitudes b; and bg 
in complex notation is b) + ba exp(i27cy), with y the total path difference and o 
the wave number. If the wavefronts have equal intensity, their combined intensity 
is 4b? cos? roy, with b = b; = be, forming a series of intensity fringes across the 
detector. Beyond large path difference y = L, the wavefronts lose their mutual 
coherence. If it were possible to scan out to infinite mirror spacing in infinitesimal 
steps, the superposition would be represented by the ideal Fourier Transform pair 


b(y) = /- B(o)(1 + cos 2roy) do, 


Bio) = / b(y)(1 + cos 2ray) dy, 
with b(y) the signal at path length y and B(c) the desired spectrum.* Note that 
b(0 a 
b(y) — (0) = / B(c) cos 2roy do, 


2 
Boo) = be) - a 


—Co 


cos 2ray dy. 


* B(o) and b(y) are both undefined for o < 0 and y < 0; we include negative limits for convenience. 
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Quantity b(y) — b(0)/2 is usually denoted the interferogram, although this term is 
sometimes used for b(y). Spectrum B(c) is computed using fast Fourier transform 
software and/or dedicated circuitry. 

Construction of a Michelson filter is a major opto-mechanical challenge, so the 
ideal Fourier transform pair is never realized. However, the Michelson filter comes 
closest to the ideal tunable filter. It does not suffer from the coating thickness 
problems of the FP filter, therefore reaches low orders even at optical wavelengths. 
It comes into its own in vacuum; otherwise, atmospheric emission contributes noise 
at all frequencies. In contrast, the TTF concentrates its effort in very dark regions 
between the bandheads of atmospheric molecular rotation—vibration emission. 


3.3. Grating Filters 
3.3.1. Resonant Grating Filter 


These filters, inspired by the diffractive colors of insect carapaces and wings, con- 
stitute dielectric gratings made with three-dimensional, sub-micron structure. The 
zeroth-order reflection exhibits a broad-to-intermediate bandwidth (R ~ 20) is 
highly polarized and maintains useful efficiency over +30° tilt (or rotation). Ref- 
erence 5 presents one grating design that produces a roughly constant bandpass 
profile from 450 to 850 nm over this tilt range. Grating filters — and their close rela- 
tives, evanescent gratings — are now widely deployed as reflective and transmissive 
volume-phase holographic (VPH) gratings. These attain high efficiency, meaning 
that most light is “blazed” into a few spectral orders that — unlike conventional 
mechanically ruled gratings — can be tuned over a wide wavelength range with 


minimal destructive interference. Tuning is accomplished through Bragg diffrac- 
tion by physically tilting the grating relative to the grating’s internal “superblaze” 
angle.!? VPHs are now the light dispersers of most moderate and high R imaging 
spectrographs. The fringes are emplaced by laser interferometry as a hologram into 
a photosensitive layer. In a conventional holographic grating, the interference pat- 
tern modulates the transmitted intensity sinusoidally across the grating, i.e. does 
not mimic the sharp rectangular facets of a conventional ruled diffraction grat- 
ing, so has substantial diffractive losses. In contrast, a VPH grating shapes the 
diffracted pattern throughout its volume of its photosensitive layer to optimally 
couple the light within, thereby sharpening the blaze. The exposed emulsion is 
then developed chemically and, critically, desiccated in alcohol to collapse its thick- 
ness into a controlled thin layer that is then sandwiched between glass sheets for 
hermetic and abrasion protection. The glass substrates and the diffracting layer 
can be curved to receive the hologram and the hologram itself can have a non- 
spherical pattern, to form a highly efficient diffractive/refractive powered optical 
element. 

A notable feature of grating filters is that they achieve by far the best out-of- 
band rejection, e.g. >60 dB suppression at +0.05A, where A, is the central wave- 
length of the filter peak.® This is the degree of suppression between successive orders 
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when the instrument is illuminated by a monochromatic source, one reason why this 
technology is exploited by tunable lasers. 


3.3.2. Acousto-Optic Tunable Filter 


An optically anisotropic, birefringent medium supports a delay, i.e. different refrac- 
tive indices, between ordinary and extraordinary rays that travel along the fast 
and slow axes, respectively, of a transparent crystal. An acousto-optic tunable filter 
(AOTF) electrically tunes acousto-optic diffraction within the crystal with piezo- 
electric transducers such as lithium niobate LiNbO: (Fig. 6). When the transducer 
beats at 10-250 MHz (radio) frequencies (RF) with driving power of a few Watts, 
ultrasonic waves vibrate the crystal to form a periodic compression/rarefaction of 
its lattice. This elasto-optical effect alters the two refractive indices periodically 
in space to act as an adjustable diffraction grating. Even more extreme than in 
the layer of photosensitive emulsion in a VPH grating, diffraction within an AOTF 
occurs throughout the crystal volume; in consequence, in-band flux is effectively 
tuned into only m = +1 orders. 

The converging light wavefront is relayed through the ATOF along a crystal 
axis. In a collinear AOTF, the acoustic wave also propagates along this principle 
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Fig. 6. A noncollinear AOTF. The RF transducer at bottom right and absorber at top left form 
a periodic variation in the two refractive indices throughout the crystal. Incident light enters the 
crystal at bottom left, propagates on different ordinary/extraordinary paths, and is dispersed into 
+1 orders by the standing acoustic wave. These orders spread apart on exit from the crystal 
and their resulting images are usually summed. The undeviated zero order out-of-band light lies 
between them on the detector and is restricted in wavelength by the moderately wide filter at left, 
which is really farther to the left (i.e. farther from focus) than shown here. (See electronic edition 
for a color version of this figure.) 
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optical axis and all orders are nearly superimposed. One tunes the central wave- 
length (A.) of the diffracted in-band light by changing the RF according to the 
condition 


de = VAn/f (5) 


with V the acoustic wave’s velocity at frequency f and An the size of the crystal’s 
birefringence; the deviation angle on exit and the acceptance angle are fixed by the 
crystal and are both small in collinear devices. Maintaining this matching condition 
requires that light be incident as an almost plane wavefront (only a few degrees of 
convergence), hence a very small field of view. Order m = 0 can be suppressed to 
0.1% if the input light is polarized and a crossed polarization analyzer is used at 
the exit of the crystal; thus, half of a typical unpolarized astronomical source flux is 
lost. The in-band diffracted light obtains polarization perpendicular to the incident 
light. Crystal defects at ~15 wm scale can undermine image quality and only a few 
materials are suitable, resulting in relatively low transmission. Thus, these devices 
are no longer used in nocturnal astronomy. 

Instead, one uses a noncollinear AOTF where the propagation axes of light and 
sound differ. With appropriate choice of the paths allowed in these devices to adjust 
delays between the ordinary and extraordinary wavefronts, wavelength-dependent 
image shifts reduce enough to triple the acceptable incidence angle at given R com- 
pared to collinear devices. Moreover, because unpolarized incident light is diffracted 
in-band into m = +1 orders now when it exits the crystal, its propagation to focus 
can be almost independent of the in-band wavelength. Thus, the in-band images 
more widely separate from the bright undiffracted m = 0 one by up to ~10 degrees. 

Although polymers have been developed with variable and controllable birefrin- 
gence, a TeOg crystal is preferred for ground-based astronomy because of its superior 
clarity and transmission from 350 to 5000 nm wavelengths with bandpass a few to 
~10 nm. The largest AOTF is only ~2 cm diameter because it has been difficult to 
maintain a uniform acoustic standing wave over larger areas; power dissipation can 
also be challenging to handle. However, the acceptance angle of the AOTF exceeds 
that of the FP (5°-15° for a noncollinear AOTF vs. <2° for FP) allowing an AOTF 
to operate close to focus where its small size is less of an issue. 


3.4. Buirefringent Filters 


The underlying principle of a birefringent filter is that light originating in a single 
polarization state can interfere with itself. Reference 13 discusses the relative merits 
of different types of birefringent filters. These filters are characterized by a series 
of perfect polarizers (Lyot filter), partial polarizers, or an entrance and an exit 
polarizer (Sole filter). The highly anisotropic off-axis behavior of uniaxial crystals 
gives birefringent filters a major advantage: their solid acceptance angle is 10-100 
times that of interference filters, although this throughput gain is partly offset by 
half the light being rejected at the entrance polarizer. 
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3.4.1. Lyot Filter’4 


This type is conceptually the easiest to understand and is the basis for many vari- 
ants. It consists of a chain of independent segments, each composed of a birefringent 
plate sandwiched between parallel polarizers. The entrance polarizer is oriented 45° 
to the fast and slow axes so that the linearly polarized, ordinary and extraordinary 
rays have equal intensity. Time delay through a crystal of thickness d of one ray 
relative to another is simply dAy/c, with Ay the difference in refractive index 
between fast and slow axes and c the speed of light in vacuum. The combined beam 
emerging from the last polarizer varies in intensity by I cos?(mdAp/X), with I the 
wave amplitude, a dependence that is often mis-stated in the literature (S. Dacie, 
private communication, 2016). Each stage is twice as thick as the previous one in 
the optical train. A quarter-wave plate within each stage tunes the filter. Each stage 
is rotated through twice the angle of the previous. 

Lyot filters have been made up to 30 cm diameter to work over a narrow spectral 
passband. Broadband instruments have never exceeded 10 cm due to the difficulties 
and expense of making such large retarders achromatic. By using half-wave plates, 
the Lyot filter can operate directly in wavefronts that converge as fast as f/2. 

B. Woodgate (NASA Goddard Space Flight Center) built a Lyot filter that 
utilizes eight quartz retarders of 10 cm diameter. The retarders, each sandwiched 
with half-wave and quarter-wave plates in addition to polarizers, are rotated inde- 
pendently with stepping motors. They achieve a bandpass of 0.4—0.8 nm, tunable 
over half of the wavelength range 350-700 nm. As is true for all interferometers, 
temperature stabilization is important, and temperature control can also be used 
here for fine wavelength adjustment. 

Crystal birefringence provides a large gain for isolating a spectral element, com- 
pared with, say, the reflecting plates of the TTF. If we seek to block neighboring 
orders with conventional R ~ 5 filters, the required TTF sub-micron spacings can- 
not be achieved and are subject to reflectance phase effects.'° An air-spaced FP 
plate separation of d, provides R = 2d,Np/A, with Np the reflective finesse of the 
coated plates. This compares with a Lyot filter resolving power of R = bd, N/A, 
with Ny the Lyot finesse and thickness d, determining the order separation (fringe 
spacing). For two commonly used materials, MgF2 and crystal quartz, b ~ 0.01. 
This means that, for the same resolving power, the thinnest element of the Lyot is 
100 times thicker than the equivalent plate spacing of the TTF. Calcite (b = —0.18) 
provides an even larger gain factor but has extremely thin optical elements. For the 
Lyot filter, it is possible to bond sub-mm thick elements to a supporting substrate, 
or to use thicker retarders in a subtractive arrangement (Fig. 7(a)(vi)) to achieve 
the equivalent optical thickness. 

A proposed Lyot filter for the Anglo-Australian Telescope (AAT)? £/3.3 prime 
focus showed that the pre-existing telescope corrector optics suffice to flatten out the 
field-expanded, hyperboloidal primary mirror. The 16 cm diameter retarder would 
have MgF» or sapphire elements with ideal birefringent properties. The thinnest 
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Fig. 7. Lyot filter construction and transmission profile. (a) Stages to construct a Lyot filter. 
Elements are the polaroid (P), retarder (R), half-wave plate (A/2), and quarter-wave plate (\/4). 
The black-tipped arrow orients the key elements; the white-tipped arrow orients the fast axis; the 
optical axis runs horizontally. (i) The simplest Lyot filter, which produces the cosine behavior in 
Eq. (6). (ii) The wide-field compensated filter, in which the retarder is split and crossed. (iii) A 
single-stage filter with tuning capability. (iv) A wide-field compensated filter with tuning capability. 
(v) A fragile, thin element filter with tuning capability. (vi) An equivalent unit to that presented 
in (v), now aided by thicker elements via the differencing principle. The crossed retarders are thick 
enough to require wide-field compensation. (b) Transmission profiles of a simple Lyot cascade with 
(top to bottom) one, two, three, four, and five stages. The thinnest element has retardance R while 
the following elements have thicknesses that are multiples of this element. Polaroids P align with 
each other and orient at 45° to the retarders. 


element is ~0.25 mm, which is well within manufacturing tolerances after bonding 
with neutral glass. Because quarter-wave and half-wave plates are extremely expen- 
sive and do not have ideal broadband response, sub-A transmissive gratings would be 
used instead. Bleached polaroids give 95% transmission over half the optical range, 
such that two Lyots can cover the full range; the Lyot orders can be isolated with 
SDSS filters. These filters appear to be ideal, being thin and of large aperture, and 
having broadband 100% transmission. The filter assembly will be ~10 cm long, in 
size and aspect ratio much like the TTF. An important departure from the GSFC 
Lyot filter is that each of the five segments will be tuned independently of the 
others using separate motors and encoders. This will allow explicit correction for 
temperature and manufacturing defects, and reduce heating of the immersion oil 
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caused by unnecessary rotations. The Lyot stages are easily laser-aligned because 
each stage generates a unique cosine-squared modulation at fixed wavelength. 


3.4.2. Sole? Filter 


These highly nonintuitive filters use only two polarizers and a chain of identical 
retarders at varying angles.!© Folded (zigzag) designs perform better than fanned 
designs. Reference 16 made a 7 cm diameter Solc filter. It has the extraordinary 
capability of tuning the spectral profile: an n-element Sole filter can have a profile 
that is determined by n Fourier coefficients. This can also be achieved with polariz- 
ing filters by proper choice of crystal lengths. To our knowledge, a wide-angle filter 
using half-wave plates has not been attempted, although it is theoretically feasible. 

Crystalline materials are usually, but not exclusively, birefringent in that the 
refractive index differs along two axes within the crystal. In a positive uniaxial 
crystal (e.g. quartz), the refractive index experienced by the ordinary ray on the 
“fast” axis, No, is smaller than for the extraordinary ray, n-, along the “slow” axis. 
Birefringence, b = no — Ne, leads to time delay, dt = bd/c, where c is the speed of 
light in vacuum and d is the thickness of the element. Retardance is expressed in 
angular units, i.e. r = 27bd/X. 

For light traversing the simple system in Fig. 7(a)(i), the transfer function is 
given by Malthus’ law, 


T(A) = cos?(mbd/X). (6) 


Reference 17 realized® that a cascading series of aligned birefringent elements 
could isolate a spectral line. From Eq. (6), the easiest arrangement to accomplish 
this is to double the thickness of each successive element down the chain (see Fig. 7). 
The interleaved polaroids are essential to the rejection of out-of-band emission. 

Reference 19 demonstrates that the bandpass profile of a series of g elements is 


T() = 1 sin?(2%7bd,/X) 
AG sin? (abdo/d) | 


It follows that the free spectral range is A\ = d?/bdo, the finesse is Np = 1.132%, 
and the spectral resolution, 6A, is their ratio. These quantities can be related directly 
to equivalents for Michelson and FP interferometers. 


3.4.3. Liquid-Crystal Filter 


These are rapid switching (5-50 msec), electronically tuned devices that employ 
either ferro-electric or nematic liquid crystals. The latter device is more common”? 
and comprises a series of liquid-crystal elements whose thicknesses are cascaded as 


in the Lyot filter. However, wavelength tuning is here achieved by electronically 


bThe proper Czech pronunciation is “Sholtz.” 
©The principle was independently discovered by Ref. 18. 
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rotating the axes of the liquid-crystal (LC) waveplate. Without voltage, retardance 
is maximized; at large applied voltages, retardance is minimized. Retardance can 
be tuned continuously to tune the wavelength. 

Liquid-crystal filters are ubiquitous. The biggest we have seen has a 4 cm diame- 
ter, requires 5 V to scan through its FSR, and has good image quality. A single-stage 
device has R ~ 5, and can be tuned across the entire optical window. Unfortunately, 
peak transmission is only 30%. 

An interesting variant! is the holographically formed, polymer-dispersed liquid- 
crystal (PDLC) filter. These exploit the phase separation of the LC and the evolv- 
ing polymer during photo-polymerization. In the 0 V setting, the micron-sized LC 
droplets are randomly oriented leading to a refractive index mismatch with the con- 
fining polymer. This is the strongly scattering (opaque), i.e. nontransmissive, state. 
A (polarizing) transparent state is achieved with a suitable choice of voltage to align 
the LC droplets. These can be used in reflection or in transmission (diffraction) and 
can be switched very rapidly (~0.1 ms). The control wavelengths are set by the 
cure conditions during photo-polymerization. 


4. Tunable Filter Pitfalls 


4.1. The Phase Problem 


A tunable filter tries to form a monochromatic image over a wide field, but is foiled 
because most filters obey Eq. (1) somewhere in the process. The cos @ dependence 
constitutes the “phase problem”: a shift of central wavelength across the field that 
is usually a diametric but sometimes linear (Bragg grating) gradient. For example, 
the TTF has a phase or band shift of ~15 A (at Ha) over its 10’ field of view on 
the 3.9 m AAT. 

Some filters have no phase effect, e.g. absorption or scattering filters. The Chris- 
tiansen filter is a matrix of glass and resin beads. Dispersion curves of its glass and 
resin cross at one wavelength where this light forward scatters regardless of incidence 
angle, whereas light at all other wavelengths scatters incoherently. These filters have 
low transmission, so have very limited application. 

There is one instance where the phase effect is a strength rather than a weakness: 
it can discriminate spurious scattered light from diffuse emission intrinsic to the 
source. But mostly we prefer a monochromatic view. All attempts to counteract 
cos @ fail: curved or Fresnel plates (changing /), or higher or graded index (changing 
4), or slower camera or larger beam size (changing @). They either work only on- 
axis or compound the problem rather than solving it. Thankfully, the Lyot filter 
succeeds, as we discuss in Sec. 5.2. 


4.2. Shaping the Bandpass 


The sharply peaked bandpass profile produced by all tunable-filters is not ideal. 
A flat-topped profile with even a small wavelength interval covered at the peak 
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transmission avoids narrowly missing most of the line flux from the source. Because 
the Michelson interferometer (filter) images in the frequency domain, its bandpass 
profile can be partially squared off in data reduction through a suitable choice 
of convolving function. The Sole filter achieves this flattening through a com- 
plex arrangement of fan-shaped or zigzagged retarders.1° Mathematically, all band- 
limited functions can be squared off, but in practice this is difficult. There are three 
experimental solutions: 


e Use near-identical multi-cavities in series, requiring exceedingly stringent tol- 
erances on matching interference coatings. This is the theoretical basis of the 
Butterworth function. 

e Use a multi-pass arrangement of a tilted mirror on the back face of the filter to 
repass light through the interfering cavity. This greatly restricts the field of view. 

e The easiest practical solution, with the bad side-effect of lower overall throughput, 
is to oscillate tuning around a fixed configuration: e.g. small jitters in rotation 
(Lyot) or plate oscillations (FP) during a unit exposure. 


4.3. The Stray Light Problem 


Imagers are notoriously susceptible to diffuse scattered light and internal ghost 
reflections. Observing tactics are sometimes needed to distinguish artifacts from 
reality. For example, some ghosts reflect around the optical axis. An angular shift 
between exposures of x arcseconds moves ghosts by —2a, making them easy to iden- 
tify. More thought should go into the design of anti-reflection coatings in cameras. 
Design packages such as Zemax easily trace the dominant scattered rays, so there is 
no excuse. A particular challenge is the air—silicon detector interface that makes a 
diametric 10% ghost image in many cameras. 


4.4. Fringing in Narrowbands 


CCD fringing can be a problem in narrowbands as the wavelength is tuned. The 
interference pattern can vary wildly, resembling shifting sand dunes or oil slicks. 
In regular spectral intervals, fringing cycles through maxima and minima. Modern 
thick (e.g. high electrical resistivity, deep depletion) CCDs are much less susceptible 
to fringing. CCD manufacturers often avoid fringing by placing a gradient anti- 
reflection coating across their device to match the axis of wavelength dispersion of 
a grating. Unfortunately, there is no correct axis for the two-dimensional detector 
of a tunable filter. 


4.5. The Atmosphere in Narrowbands 


At the AAT we learned the importance of accurate simulation calculators (e.g. 
www.aao.gov.au/cgi-bin/ttf) to avoid pitfalls. For example, in the SDSS r- and 
i-passbands, narrow water absorption features diminish transmission. The TTF cal- 
culator has a full model of these based on echelle spectra. 
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5. Tunable Filter Advantages 


A tunable filter has several operational advantages over conventional imagers to 
minimize systematic errors, i.e. those errors whose influence does not diminish as 
exposures lengthen. Conventional images are taken sequentially, which is not ideal 
at even the best sites as atmospheric or small instrumental variations arise during 
the tuning sequence to build up a data cube. Early on this was addressed for tun- 
able filters by taking many repeat short exposures with a photon-counting detector 
having minimal dead time and no readout noise, then averaging. Such devices were 
once far less efficient than CCDs, so “slow scan” techniques like “nod and shuffle” 
were developed to avoid multiple noisy readouts by keeping the charge on the CCD 
between (in most cases) only two or three separate wavelengths. Today, electron- 
multiplied CCDs (EMCCDs) combine high sensitivity with photon counting, and 
4K x 4K pixel devices are becoming available. 


5.1. Differential Imaging 


The instrumental point-spread function is usually identical for all bands over the 
tuning wavelength range, which greatly assists differential imaging. When linked 
to charge shuffling, two discrete bands can be exposed repeatedly, interleaved in 
time (e.g. [O II]\5007 and H@) such that differential errors pixel to pixel are much 
smaller than those with separate filters. Differential imaging typically obtains excel- 
lent data even through light cirrus clouds or on rapidly changing sky background 
(e.g. moonrise). In other words, photometry of individual lines may be worthless, 
but pixel to pixel comparison is excellent. 


5.1.1. Sky Background 


A tunable filter with a wide range of interference orders, like the TTF, can be 
very efficient in detecting faint emission or absorption lines. Such programs typi- 
cally require an “off” (i.e. continuum) band for comparison. The TTF allows rapid 
changes between narrow- and broadbands, such that most time can be allocated 
to the narrowband, with a small overhead to expose the broadband. Moreover, the 
broadband can oscillate in wavelength to straddle the narrowband so as to remove 
a sloping spectral background. To maximize observing signal/noise, tunable filter 
exposures are set to reach the sky flux limit within a pixel in the bandpass; once this 
limit is reached, signal/noise grows as the square-root of the observing time. The 
better the site image quality (often after correction for turbulence with an adaptive 
optics system), the more sensitive one can be on point sources by minimizing the 
pixel size. Being able to choose pixel binning, bandpass and tune to an exact wave- 
length allows one to optimize the experiment. Web calculators facilitate this during 
the data acquisition. 

A curious advantage was realized with the design of the 5-stage Lyot filter 
for the AAT. TTF programs routinely exploit the very dark sky bands between 
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atmospheric OH band heads. In bright moonlight, sky background at right angles 
to the direction of the Moon is almost entirely scattered light in the inter-bandhead 
wavelength intervals. If we align the Lyot entrance polarizer to this direction, the 
sky background should diminish to that of the dark sky. There are few other cases 
in optical astronomy where dark sky astronomy can be carried out every night (cf. 
echelle spectroscopy). 


5.1.2. Time Series Imaging 


Tunable filters are effectively used in time series mode. A full discussion of this 
technique is beyond the scope of this chapter; we refer interested readers to Refs. 21 
and 22. 


5.1.3. Object Finding 


Excellent overviews of the mathematics of tunable filter analysis are given else- 
where.!:?3 In astronomy, our science fields often have many independent tar- 
gets or sources. One cannot overstate the importance of object-finding codes like 
Sextractor.*4 Many TTF programs have benefited enormously from its applica- 
tion. Objects found in one image can be easily associated with objects in successive 
images, precisely the function needed to analyze a scan of consecutive spectral 
bands, e.g. to find emission line objects or flux dropouts in a narrow redshift 
interval.?3?5 

Wide-field surveys are usually restricted to broad photometric bands. Substan- 
tially narrower bands are a technical challenge. Survey efficiency is often loosely 
stated in terms of the AQ product, or étendue, with A the telescope collecting area 
and 2 the total solid angle of the sky survey. For a single telescope pointing, 


1d? 
AiQy & A2Qe & (=) ) (7) 


where subscript 1 refers to the telescope, subscript 2 to the camera, d is the detec- 
tor size in mm, and F is the beam focal ratio (0.5 < F < oo). Figure 8 shows 
the geometry. In Eq. (7), to sidestep subtleties involving field-expanded foci, we 
adopt F > 2 because wavefronts that converge faster than ~30° cannot be properly 
compensated by the method of Lyot. 

For AQ to measure survey efficiency usefully, there are certain qualifications. 
We assume that the sources under study (a) are resolved by the instrument and 
the pixel sampling, (b) are detected in a reasonable exposure time, and (c) that 
the signal-to-noise ratio (SNR) increases as (time)!/? 
limited by detector read noise or charged-particle background. The right-hand side 
of Eq. (7) could reasonably include a natural threshold of form H(S > S,) with Sy 
the limiting acceptable SNR, and H the Heaviside operator (H(S < S,) = 0 and 
H(S > Sp) =1). 

A rapidly converging wavefront captures the widest fields and highest sensi- 
tivity to low surface-brightness. However, faster convergence degrades the energy 


, in contrast to observations 
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Fig. 8. In a matched optical system, the A1Q1 product of the detector measured at the telescope 
aperture is equal to the A2Q2 product of the telescope measured at the detector. The telescope 
diameter is D and the focal length of the telescope mirror is f; the detector size is d. The AQ 
invariant is the throughput or étendue of the system. (ii) Angle convention for an off-axis ray 
incident on an optical element. 


interval isolated by interference, particularly for interference filters, because the 
path length has an angular dependence that causes interference filters to exhibit a 
phase effect (shifting central wavelength) over the field of view. An impractically 
expensive solution is an interference filter that instead covers the telescope entrance 
aperture. As far as we can determine, the 35 cm diameter Ha (R ~ 100) filter for the 
UK Schmidt telescope in Australia is the largest and most expensive interference 
filter made for astronomy to date. By simple scaling, a single monolithic filter of 1 
m diameter would exceed US$1M, more than the cost of the fully tunable optical 
system (filter + top end) proposed in Sec. 5.2. 


5.2. Wide-Field Compensation 


The wide-field expanded Lyot filter is (almost) the final word in exploiting the 
widest possible field of a given telescope, enabling many new astronomical programs. 
Wavefronts that converge as fast as f{/2 can be compensated with crossed birefringent 
elements, in concert with half-wave plates, so that even a constant sub-A bandpass 
is possible across a degree-scale field of view. 

A narrow spectral band cannot be isolated in a fast beam by using an interfer- 
ence or etalon filter because the path length through the resonating cavity cannot 
be equalized for all off-axis angles. Remarkably, off-axis paths can be equalized 
with crossed birefringent elements even for R ~ 10°! Wide-field compensation has 
been demonstrated to be extremely effective for narrowband imaging, particularly 
in remote sensing and solar astronomy.”° 

There are many useful discussions on the basic principles of the Lyot fil 
ter;?!3 here, we attempt an intuitive approach. To arrive at the low-resolution filter 
proposed here, there are four key elements: (a) interference through birefringence, 
(b) wide-field compensation, (c) wavelength tuning, and (d) retardance through 
differenced elements. 
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5.2.1. Constructing the Filter 


First, consider how wide-angle, zero-wave retarders are built. The retarder element 
in Fig. 7(a)(i) is split perpendicular to the propagation axis into two equal thickness 
elements. Balancing fast and slow propagation is the primary method of wide-field 
compensation. The pieces are then bonded after aligning, by rotation, the fast axis 
of the second element with the slow axis of the first, yielding no net retardance for 
any angle through the retarder. A half-wave plate oriented at 45° to the fast and 
slow axes and placed between the split retarder elements avoids simple cancellation 
of the zero-wave retarder (see Fig. 7‘a)(ii)). Why this works is involved: Refs. 2 and 
14 give the most accessible discussion. 

The half-wave plate advances the phase of the o-ray by 7 relative to the e-ray, 
rotating the plane of polarization by 7/2 to re-align the o-ray with the fast axis of the 
second element. Hence, the on-axis time delay of the crossed elements (Fig. 7(a)(ii)) 
remains unchanged from the original system (Fig. 7‘a)(i)). Off-axis rays then benefit 
in two respects. While the behavior of interference filters is isotropic, uni-axial 
crystals have a strong azimuthal dependence. For rays at arbitrary azimuthal angle 
@ (see Fig. 8), the general form of the retardance is 


sin? 6 (= o_ sin? *)) 


2No No Ne 


(0,0) =o f = (8) 


It flips sign between neighboring quadrants, thereby providing the physical basis for 
wide-field compensation. Remarkably, with the half-wave plate in place, the complex 
azimuthal behavior nearly disappears. Light that enters the first split element from 
direction (0, @) enters the second element from direction (@,¢+ 7/2). From Eq. (8), 
overall retardance is now 


| sin? 6 (——*)| 
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The equivalent form for the interference filter is 
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The monochromatic acceptance solid angle of a compensated birefringent filter is 
2no/(Ne — No), ie. orders of magnitude larger than that of an interference filter. 
Moreover, Lyot filters have much greater RQ than interference or etalon filters. 


5.2.2. Tuning the Lyot Filter 


Reference 16 demonstrates several ways to tune a Lyot filter to the desired wave- 
length, none of which is particularly intuitive. A given birefringent element outputs 
elliptically polarized light. The trailing quarter-wave plate transforms this into lin- 
early polarized light whose orientation, ~, (and therefore wavelength) depends on 
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the polarization ellipticity, « as 7 = tan7'e with e = —tan(mdb/A). The tuning 
relationship for a single stage reduces to 


v0) = mdb (F- =), 9) 


where natural wavelength , is selected at 7 = 0. In practice, one tunes by rotating 
the exit polarizer, which forms the entrance polarizer of the next stage. Transmission 
is maximized when each successive stage is geared to rotate by precisely twice the 
rotation of the preceding thinner element. The (w,.) relation for the complete 
Lyot system over the full optical range is generally more involved than Eq. (9). 
Conveniently, wavelength tuning becomes less sensitive to angular errors as the 
elements thicken. 

A Lyot filter is fairly straightforward opto-mechanical device to tune. The orig- 
inal design adopted by the GSFC Lyot filter uses successive 2:1 gear ratios. The 
first element is driven by a stepper motor. Gear wheels attached to this stage rotate 
differentially the successive stages through connecting axles to gears attached to 
each stage. Thus, 2” unique spectral elements need 2” half rotations to cover the 
free spectral range, and wavelength tuning requires only differential rotation of the 
Lyot stages. 


6. Scientific Gains of Tunable Filters 


6.1. Observational Cosmology 


Wide-field imaging and imaging spectroscopic surveys dominate modern astro- 
physics, e.g. AAT/Sami, SDSS/Manga, and VLT/Muse. Despite the data cubes 
that can be obtained from such IFS instruments, surveys of emission-line sources at 
cosmological redshifts underline the great potential of targeting narrow photometric 
bands, particularly of fields having pre-existing, broadband data including from the 
HST and soon the JWST. In narrowband imaging, Lya searches with SuprimeCam 
and HyperSuprimeCam at the Subaru telescope push to the widest angular fields 
to date. 

For redshift-targeted programs, tunable filters improve over existing spectro- 
graphs because every pixel can detect line-emitting objects. Such objects detected 
by the TTF in 4 hours (through differential imaging) are at the sensitivity limit 
of Keck/LRIS. A tunable Lyot filter promises even bigger advantages. It is often 
overlooked just how large the angular sizes of galaxy clusters are anticipated to 
be at high redshift. If one considers galaxies on turn-around orbits that are just 
separating from the Hubble flow, the sphere of influence of a Coma-like potential 
projects to 0.5a/(40 Mpc) degrees at z ~ 1 with a the angular diameter of the 
sphere. Moreover, VIRGO Consortium simulations reveal that supergalactic struc- 
ture spans several degrees. If star-forming dwarf galaxies or Population II star 
clusters trace the “foothills” of large-scale structure, it is reasonable to expect that 
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future narrowband surveys on degree scales could directly observe the anticipated 
supercluster networks. Of course, a tunable filter will be restricted to emission lines, 
so IFS will continue to deliver complementary spectroscopy of starlight. 

Before cosmological reionization, the first galaxies and quasars are expected 
to possess large, polarized halos of La radiation.2° These photons are expected to 
scatter off neutral atoms that are expanding with the Hubble flow. Can the low 
surface brightness halos be detected in polarized light? The anticipated sources will 
be invisible at optical wavelengths: the escaping Lya radiation emerges in the near- 
IR. A simple, high throughput, low-resolution, tunable imaging polarimeter could 
identify candidates at z > 5 reliably. 


6.2. Representative Programs 


Wide-field tunable imagers enable many interesting avenues of research. Much of 
this work has been reviewed in Ref. 27: 


e Warm ionized gas has been detected to the HI edge of essentially all spiral galaxies 
with observations of sufficient depth (e.g. Ref. 28); systematic work on what the 
emission is telling us will need dedicated follow-up programs. 

e Largely unexplored is use of tunable imagers to map spiral galaxies in absorption- 
lines (e.g. Ref. 29), e.g. differential comparison of Mg to Fe to trace variations in 
age or metallicities of the stellar populations. 

e Detection of warm intra-cluster gas in galaxy groups, which is thought to harbor 
many uncounted baryons (e.g. Ref. 30). Both low and high-redshift clusters occa- 
sionally show “cooling flow” emission. Nearby galaxy groups like Sculptor subtend 
tens of degrees and future studies would benefit from a wide-field imaging system. 

e Recent work on Ha trails in rich clusters (e.g. Ref. 31) shows promise for inferring 
3D space motions. 

e Some radio sources subtend huge angles, e.g. Cen A, For A, and some radio lobes 
show clear patches of radio depolarization suggestive of intervening warm gas, 
e.g., Cyg A, For A (e.g., Ref. 32). 

e Starburst winds extend over large scales, e.g. M82, where the soft X-rays and Ha 
emission extend over 0.5° (e.g. Ref. 33). 

e AGN ionization cones have been detected to the HI edge of dozens of Seyfert 
galaxies (e.g. Ref. 34). 

e Diffuse low surface brightness stellar structures are seen in clusters and around 
normal galaxies and mergers. Intra-cluster and free-floating planetary nebulae 
offer the exciting possibility of probing the dynamics of these structures (e.g. 
Ref. 35). 

e The diffuse ionized gas seen in the Milky Way Galaxy (“Reynolds layer”) and 
nearby galaxies has an unidentified heat source (e.g. Ref. 36). There is keen inter- 
est in studying this emission in diagnostic lines other than Ha. 
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e The Magellanic Stream can be seen in Ha and clearly probes a complex halo 
radiation field. The Magellanic Bridge is particularly strong in Ha and subtends 
30° (e.g. Ref. 37). 

e High- and intermediate-velocity H I clouds in the Galactic halo are detected in 
Ha (e.g. Ref. 38). Their nature and origin are poorly understood at this time; 
some may even have dark matter halos. 

e There are very many emission nebulae around compact sources, including photo- 
dissociation regions, super-soft X-ray sources, X-ray binaries, potassium shells 
around stars, planetary nebulae, supernova remnants, and fast-moving pulsars. 
Emission nebulae are also expected around soft gamma-ray repeaters and gamma- 
ray bursters. 

e Comet tails show line emission. 


There are many more science cases associated with Solar System physics that we 
do not cover here.?9 


7. Conclusion 


Tunable filters provide excellent photometry of extended and compact, faint 
sources. Coupled to EMCCDs, they can produce fine differential measurements 
between bands even in nonphotometric conditions. The residual signal is often 
Poisson-distributed without systematic uncertainties from, e.g. CCD artifacts or 
poorly-subtracted night-sky lines. Tunable filters are excellent bright- and dark- 
time instruments, can be adapted for polarimetry, and are excellent time series 
analyzers. AOTFs perform well in the UV, but require cryogenics in the IR. FP- 
based tunable filters are inexpensive and relatively easy to use productively; while 
flawed as tunable filters, in practice they will not be surpassed easily for ground- 
based observations. 

The wide-field Lyot filter is an important exception that has been extensively 
exploited in daytime astronomy, and with limited application in nighttime work. 
Here we have focused on this technology because it demonstrates important prin- 
ciples of tunable imaging. For redshift studies, its AQ product can greatly exceed 
the performance of multi-slit spectrographs and can complement deployable IFS? 
such as the SAMI instrument at the AAT.*° A low-R Lyot filter offers a rare oppor- 
tunity to exploit the fastest converging wavefronts and the full acreage of the new 
generation of CCD mosaics. Although its entrance polarizer immediately halves the 
incident light, the overall throughput to unpolarized light is comparable to the TTF 
because there are only half a dozen air—glass interfaces ahead of the telescope prime 
focus. 

There are many conceivable technologies for tunable imaging. Important devel- 
opments have been possible using microelectromechanical (MEMs) technology 
through small-scale integration of basic filter mechanisms.*! This development path 
is likely to benefit from the push towards nanotechnology, but we note that most 
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published broadband mechanisms are small, i.e. of order 1 cm or less in diameter. 
Many things become possible if nano-patterning by design become possibles over 
larger optical surfaces. 

Beyond the topics covered here, there is a bewildering array of possibilities 
enabled by rapidly evolving photonic technologies,*! e.g. tunable crystal lattice 
structures with many possible variants. There is intriguing overlap with biologists 
who study the chromatism of insect wings and fish scales. The physical processes 
involved are so diverse that it is a challenge to describe the different technologies 
within a consistent mathematical framework. 
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A solar coronagraph enables study of the Sun’s tenuous outer atmosphere — the 
corona — by occulting the brilliant solar disk. We describe scientific objectives of 
solar coronagraphy; measurement requirements that follow from those objectives; 
instrument design principles to effectuate the required measurements; and some 
aspects of the practical implementation of these principles in space- and ground- 
based instruments. 


1. Scientific Objectives 


The gross internal structure of a star like the Sun is relatively well understood and 
confirmed by helioseismic measurements.! The largest gap in our understanding is 
the time-dependent interaction of convection with magnetic fields, which in the Sun 
gives rise to a quasi-periodic dynamo, magnetic concentrations such as sunspots, 
active regions, and network fields, and a large-scale magnetic field extending above 
the visible-light surface (the photosphere) into the heliosphere. Equilibrium ther- 
modynamics suggests that the temperature of the solar plasma should decrease 
monotonically from the core (~15 MK) outward, as indeed it does — until just 
above the photosphere, where the average temperature increases outward in less 
than 104 km from a minimum of about 4300 K to over 1 MK in the corona. It is 
widely agreed that the magnetic field plays an essential role in creating the corona, 
but the mechanisms by which this is achieved remain the subject of research.” 
The observed corona is highly structured and dynamic, again because of the 
Sun’s magnetic field.? Characteristic structures seen repeatedly in coronal images 
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streamer,” “plume” or “jet.”4 
It is difficult to measure the coronal magnetic field directly.°»° However, the gas 


have acquired descriptive names, such as “loop, 
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Fig. 1. Negative composite image of the white-light corona during the total solar eclipse of 11 
July 2010.7 The coronal image has been enhanced with specially tailored unsharp masking; the 
lunar image is superimposed. 


Fig. 2. Composite false-color image from the eclipse of 11 July 2010.7 Redder regions are hotter 
(dominated by Fe XIV emission at 530.3 nm) than green regions (dominated by Fe X emission at 
637.4 nm). 


and the magnetic field are tied together because of the high electrical conductiv- 
ity of coronal plasma, and luminous coronal features indicate the direction of the 
field projected onto the plane of the sky. Figure 1 shows the inner corona as seen 
in broadband visible light during a total eclipse;’ the plasma is made visible by 
Thomson scattering of photospheric light off free electrons.® Figure 2 is a different 
view during the same eclipse afforded by visible-light forbidden emission lines of iron 
that differentiate regions by temperature.’ The inner corona (out to 1.5-2.5 Ro) is 
often visualized in the light of one or more ultraviolet or extreme ultraviolet (EUV) 
allowed emission lines, as in Fig. 3; an occulter is not needed because the radiance 
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PROBA2/SWAP 17.4nm 2011-MAR-11 


Fig. 3. EUV image of the corona obtained by the SWAP camera on the PROBA-2 spacecraft.9 
The narrow bandpass is dominated by emission from Fe IX—XI. 


Fig. 4. Composite image of a coronal mass ejection.!° Solar disk: AIA EUV imager on the SDO 
spacecraft. Red annular image: LASCO C2 coronagraph on the SOHO spacecraft. Blue image: 
LASCO C3 coronagraph. The boom holding the external C3 occulter is visible on the left. 


from the solar disk is not much different from the radiance above the limb. EUV 
emission is typically too faint to be detected at larger radii, where a “white light” 
(Thomson scattering) coronagraph is still effective (Fig. 4). 

Coronagraphy is an important tool for investigating all the major scientific 
questions concerning the solar corona, including: 


e How is the gas heated? How does it cool? 
e How does mass enter? How does it leave? From what kinds of regions does the 
solar wind!! emanate? 
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e What causes solar flares!” and coronal mass ejections?!* 

e How are different types of coronal structures created and destroyed, and what 
are the spatial and temporal scales most important to their evolution? How do 
structures composed primarily of neutral matter persist in the million-degree 
corona, sometimes for weeks? 

e How does magnetic reconnection'* operate in the corona? How does turbulence 
operate? 

e How do mass ejections evolve as they propagate outward, creating “space 
weather”! in the heliosphere and at Earth? 


2. Scope 


For the purposes of this review, a coronagraph is an imager that is required to 
block (occult) light from the solar photosphere because the photosphere is much 
brighter than the corona in the imaging bandpass. White light coronagraphs* will 
be emphasized because they are the most widely used and exemplify the key design 
challenges and approaches. Heliospheric imagers are specialized wide-angle corona- 
graphs designed to image the heliosphere from 210 Ro to the Earth (215 Ro);*° 
they have much in common with conventional coronagraphs but pose particular 
design challenges not addressed here. We cite specific coronagraphs to illustrate 
design principles but do not attempt a census of existing instruments. 

A nonsolar coronagraph can resemble a solar coronagraph but may also be 
quite different.” The bright source to be occulted by a “stellar” coronagraph may 
often be treated as a point source, and the object or objects to be studied — e.g. 
exoplanets — may be at known locations or in preferred radial or angular zones, 
facts that can be exploited by tailored apodization of the coronagraph entrance 
pupil. By contrast, a solar coronagraph must occult an extended source, and the 
objects of study are typically widely distributed in radius and position angle with 
respect to the center of the Sun. 

Stray light in an imaging system is, broadly, light that reaches the focal plane 
by other than idealized ray paths such as those of geometrical optics. Under that 
definition, diffraction, unwanted specular reflection, and scattering (transmissive or 
reflective) are sources of stray light. Diffraction is always important in a coronagraph 
and will be considered below. Nondiffractive stray light is usually just as important 
but is less amenable to systematic treatment because the means to suppress it 
are highly dependent on the particulars of the coronagraph and its environment. 
Reference 17 is a general treatment of stray light analysis and control. 


°“White light” refers to the wavelength-independent cross-section for Thomson scattering. The 
coronagraph bandpass may be wide or narrow. 
>See Volume 3, Chapters 18-20. 
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The focal-plane sensor, post-sensor electronics, and data processing are essential 
components of a coronagraph system. We consider the sensor with respect to signal 
and noise, but little else about the post-focus chain can be generalized. 

We are not aware of a recent general review of solar coronagraphs. An older sur- 
vey may be found in the monograph by Billings.® Reference 18 is a lucid explication 
of optical design principles for an externally and internally occulted coronagraph. 


3. Measurement Requirements 


3.1. Signal and Backgrounds 


Traditional nomenclature distinguishes between the K- (Kontinuierlich) corona, the 
F- (Fraunhofer) corona, and the E- (Emission) corona.” The K- and E-components 
arise from coronal plasma and correspond to the electron scattering and atomic 
emission mechanisms discussed above. The F-corona, also known as zodiacal light, 
corresponds to scattering of photospheric light off interplanetary dust concentrated 
near the ecliptic plane.?? For a solar coronagraph, the F-corona is background rather 
than signal. 

Because the K-corona has the same spectral distribution as the photosphere.°® 
its signal can be expressed relative to the mean radiance of the solar disk (Bo) 
without specifying the band pass. This is approximately true of the F-corona in the 
visible region of the spectrum although not in the thermal infrared, where zodiacal 
dust is a source of emission. Figure 5 shows radiances in the bandpass 400-600 nm 
for the K-corona (at the solar equator or in a coronal hole) and the F-corona (in 
the ecliptic plane or at the ecliptic pole).?? Because the complex structures evident 
in Figs. 1-4 change over hours and days as well as systematically through the 
course of a solar activity cycle, the curves in Fig. 5 should be regarded more as 
approximate boundaries than as radiances typical of a particular solar latitude. 
Also shown in Fig. 5 is a curve representing the typical radiance of a coronal mass 
ejection (CME).'© The radiance of CMEs varies widely; the curve mainly serves to 
emphasize that, for transient features, the signal to be measured is often only a small 
fraction of the local quasi-steady K-coronal radiance, which is in turn dominated 
by background light for R = 2.5 Ro. 

In the visible region, the E-corona is observed through narrow filters, each 
centered on a wavelength dominated by a single emission line (Fig. 2); otherwise the 
signal is overwhelmed by the broadband K+F corona. In the EUV, the continuum 
background is much weaker, but a bandpass filter (typically a multilayer coating on 
a mirror) is still necessary to restrict the signal to a few dominant emission lines 
(Fig. 3); otherwise it is difficult to relate the signal to the temperature structure of 
the gas. 


°Scattering off fast-moving coronal electrons does, however, broaden and decrease the contrast of 
photospheric spectral features. 
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Fig. 5. Radiance of various components of the solar corona in units of the mean radiance of the 
solar disk (Be) in the bandpass 400-600 nm.!9 Also shown are the typical radiance of the sky 
near the Sun at a very high-quality ground-based observing site outside eclipse and during total 
eclipse. 19: 2° 


3.2. Polarization 
3.2.1. Sources of Polarization 


Linear polarization is an important physical diagnostic and instrument design tool 
throughout the corona. Both circular and linear polarization can be used to infer the 
strength and direction of the magnetic field in the inner corona (R <2 Ro), but the 
measurements are challenging and require a large-aperture (>1 m) telescope.2* 76 
Although direct photospheric light averaged over the solar disk is essentially unpo- 
larized, the K-corona exhibits a high degree of linear polarization. As summarized 
by Ref. 8, 


If the Sun were a point source of light and the corona a small aggregate of scatter- 
ing material lying at the vertex of a right angle between the Sun and the observer, 
the scattered radiation would be 100% polarized, with its electric vector normal 
to the plane containing the incident and the scattered rays. 
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Because the Sun is an extended source, and the angle between incident and scattered 
rays is not exactly 90°, accurate calculation of polarization is more complicated.® ?” 
However, the scattered radiation is still highly polarized, and the 90° (“plane of the 
sky”) approximation is adequate out to ~70 Ro.?” 

For a ground-based K-coronagraph, the main source of background linear polar- 
ization is sunlight scattered from the atmosphere, either directly or after reflec- 
tion from the Earth; Ref. 28 discusses the accuracy to which this background 
can removed by calibration. For a space-based K-coronagraph, the only significant 
source of background linear polarization is the F-corona. Both the radiance and the 
polarization of the zodiacal light are functions of ecliptic latitude and longitude and 
are relatively poorly known for solar elongation angles less than about 10°.?? Ref- 
erence 29 infers that the polarization is negligible inside 5 Re and reaches 0.8% at 
16 Re but cautions that better measurements are strongly needed. Nonetheless, the 
K-corona is more highly polarized, so polarization-sensitive measurements increase 
the signal-to-noise ratio of a coronagraph. The offsetting penalty is an increase in 
the complexity and duration of the measurement. 


3.2.2. Polarimetry 


Incoherent, partially polarized light may be characterized in a time average by 
four so-called Stokes parameters, {I,Q,U,V}, representing, respectively, the total 
intensity (radiance or irradiance) within some bandpass, the intensities of two inde- 
pendent states of linear polarization, and the intensity of circular polarization.°° 
At least three measurements must be taken to determine J, Q, and U; V is gener- 
ally not measured unless the instrument determines the strength of the magnetic 
field in the low corona using the Zeeman effect. In space-based coronagraphs, two 
common arrangements are three successive measurements through polarizers with 
axes at 0°, 60°, and 120°, or three measurements through polarizers with axes at 
0°, 45°, and 90° (a fourth measurement at 135° may be included for redundancy). 
In a ground-based coronagraph, the polarized noise induced by atmospheric seeing 
can be overcome by using fast electro-optical variable retarders and a polarizing 
beamsplitter (analyzer) to produce a rapidly modulated irradiance signal at the 
detector(s) that can be used to recover the Stokes parameters.3! 3? 

All optical systems introduce instrumental polarization that must be calibrated 
and removed. The goal is to derive a so-called instrument Mueller matrix, M, that 
connects the Stokes vector of light entering the instrument, Sin, with the Stokes 
vector of light at the focal plane, Sout, through the linear transformation Sou; = 
MSjy.2° References 28 and 33 describe representative calibration procedures for, 
respectively, a ground-based and two space-based polarimetric coronagraphs. 


4. Design Principles 


The French astronomer Bernard Lyot invented the coronagraph in the 1930s and 
used it to obtain coronal images without a solar eclipse for the first time.*4 His most 
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notable innovation was the introduction of an aperture (the Lyot stop) to block 
light diffracted around the entrance aperture, but the success of his instruments 
required unusual care in many respects, including a high-altitude observing site, 
immaculately clear and dust-free lenses, and systematic blocking of stray light. 
Lyot’s approach holds an enduring lesson for contemporary coronagraph designers: 
despite decades of instrument heritage and powerful computational tools, it is still 
easy to build a coronagraph that does not work. 


4.1. The Basic Coronagraph 


As a coronagraph is sometimes described as creating an artificial eclipse, one’s first 
thought might be to emulate the moon — to “put a thumb over the Sun.” This 
fails because an external occulter by itself does not adequately control diffracted 
light unless it is quite distant from the entrance aperture (Sec. 4.2). Lyot’s great 
insight was to realize that light diffracting around a system aperture could be sup- 
pressed by reimaging it and blocking its bright rim with another aperture. The same 
principle applies to occulters. Figure 6 serves to illustrate a circularly symmetric 
coronagraph design with both external and internal occulting. Lyot’s coronagraphs 
used only internal occulting. Evans®’ added an external occulter to the basic Lyot 
design. 


Al Field b2 


D1 


Fig. 6. Schematic layout of the LASCO C3 coronagraph on the SOHO satellite.2° The top dia- 
gram illustrates image formation; the bottom diagram shows how stray light is blocked from 
reaching the image plane. Sunlight enters from the left. Components shown: front aperture AO, 
external occulter D1, entrance aperture Al, objective lens Ol, internal occulter D2, field lens O2, 
Lyot stop A3, relay lens with Lyot spot O3, filter/polarizer wheels and shutter F/P, and focal 
plane F. 
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From left to right in Fig. 6, a front aperture blocks stray light from outside 
the desired field of view. The external occulter (D1, shown as three closely-spaced 
obstacles) ensures that the full entrance aperture (A1) is shaded from photospheric 
light. At the image plane of the objective lens (O1), a field stop aperture defines the 
field of view. If there is no external occulter, it is also necessary to place an occulter 
(or Lyot mask) at or near the image plane of the objective lens to keep light and 
heat from the image of the solar disk from reaching further into the system; however, 
this occulter does not suppress diffraction.*’ Next is an internal occulter (D2) at an 
image of the external occulter, slightly larger than that image in order to block its 
bright rim of diffracted light. A second lens reimages the entrance aperture onto the 
Lyot stop (A3), an aperture slightly smaller than the image, again blocking its bright 
rim. A final lens (O3) creates the image of the occulted Sun on the final focal plane 
(F). A small reflecting spot is often placed in the center of this lens in order to block 
an out-of-focus solar image created by multiple reflections in the objective lens.3+ 3° 

Many variations are possible on the basic design illustrated in Fig. 6. An objec- 
tive lens is usually a high-purity singlet or doublet in order to minimize small-angle 
scattering, but following lenses may be more complicated. A filter or polarizer may 
be placed in locations other than the one shown. One or more of the lenses may be 
replaced by a mirror; if the objective is a mirror, the instrument is called a reflecting 
coronagraph. 

The external/internal occulter pair and the Lyot stop each has the potential 
to reduce stray light by about six orders of magnitude, with the result that the 
most sensitive solar coronagraphs suppress stray light to the order of 107!” of the 
radiance of the solar disk.2? There is currently strong interest in a coronagraph 
that can be accommodated on a small satellite, particularly for the purpose of 
monitoring space weather. It has been shown theoretically and experimentally that, 
by careful placement of the internal occulter and field lens, diffractive light can 
be suppressed by about nine orders of magnitude even when the Lyot stage is 
eliminated; however, this level of suppression is achieved only over a limited radius 
range, 3 << Re < 12.794 


4.2. External Occulters 


An external occulter (EO) influences the performance of a coronagraph in three 
main ways. 


(1) The EO must shade the entire entrance aperture (EA) from direct sunlight. 
This requires that the radius of the occulting disk satisfy ro. > a+dtanOo, 
where a is the radius of the EA, d is the separation between the EO and the 
EA, and 9 is the angular radius of the Sun. 

(2) The EO partially shades (vignettes) the EA from coronal light for angular radii 
that satisfy ro. — a < dtan@ < roc + a. Vignetting reduces the coronal flux 
reaching the EA and reduces the angular resolution of the system. 

(3) Light diffracts around and scatters from the EO. 
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4.2.1. Vignetting 


The designer must choose roc, d, and a to be consistent with scientific requirements, 
including angular resolution, temporal resolution, and the range of solar radii to be 
observed, as well as practical limitations on the size of the system imposed by 
manufacturability and cost. Suppose that a has been chosen. Simple geometry?? 
yields the vignetted fraction as a function of d/a and the inner (smallest unoc- 
culted) field angle, Onin; this fraction is approximately linear, decreasing from 1 at 
Omin © (Toe — a)/d to 0 at Onee © (Toe + a)/d. Vignetting is a design tool as well 
as a constraint because it partially counteracts the steep radiance gradient of the 
corona (Fig. 5) and thereby reduces the dynamic range of the signal that must be 
accommodated by the focal-plane array. Reference 42 shows that the angular reso- 
lution afforded by the vignetted EA, normalized to the resolution of the full EA, is 
~1/2, where v is the vignetted fraction. Figure 7 illus- 
trates the effects of vignetting on normalized flux and angular resolution. Achieving 


well approximated by (1 — v) 


good angular resolution close to the solar limb requires a relatively distant external 
occulter. Such an occulter may be located at the end of a boom or, in space, on a 
separate spacecraft from the craft carrying the rest of the optics and the detector 


system.*3 


4.2.2. Diffraction 


Diffraction around the EO is unavoidable, hence the need for an internal occul- 
ter (IO) to accomplish for the EO what the Lyot stop does for the EA. Because 
diffracted light that reaches the EA influences all later optical stages, it is important 
to know how the diffractive illumination of the EA varies with distance to the EO 
and the amount that it over-occults. Fresnel diffraction theory is applicable because 
the EA is in the near field of the EO. Although few Fresnel diffraction problems can 


be solved analytically,** simple cases provide useful guidance. Consider a square 


1.0 6 455 
5 
2 4.0 
5 0.8 2 
Ss 2 35 
(=) 
£ 0.6 5 3.0 
s 
no] t22) 
o if © 2.55 
2 0.4 < 20 
5 3B 2.07 0 
= 0.2 2 
- 3B 15+ 359100 
£ 200020 
0.0; § 1.05, r 
1 2 3 4 5 
RIRo RIRo 


Fig. 7. Left: Fraction of the area of the entrance aperture vignetted by the external occulter for 
various values of d/a, where a is the radius of the entrance aperture and d is the separation between 
the external occulter and the entrance aperture. Right: The effect of vignetting on resolution, 
normalized to the resolution of the unobscured entrance aperture, for the same values of d/a. 
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EO normally illuminated by a point source. The complex field distribution at the 
EA can be expressed in terms of Fresnel integrals,*° and, at the center of the EA, is 
a function only of the Fresnel number of the EO as seen from the EA, F = r2./Ad. 
For practical solar coronagraphs, F = 100, which leads to the simple approxima- 
tion that the irradiance at the center of the EA is 4/(7?F). With the substitution 
Yoo = foc(a + dtanOg), where f,. > 1 is the over-occulting factor, the maximum 
of central irradiance occurs at an occulter distance of a/(foctan0o) with value 
A/(focm?atan 95). Note that the maximum irradiance is proportional to a~+ and 
the EO distance at which this occurs is proportional to a. Figure 8 compares the 
behavior of the simple relationship 4/(7?F) with accurate calculations that inte- 
grate over the solar disk for two values of the EA radius.¢ The simple relationship 
captures the maximum irradiance but not the shape of the curves. Significantly, the 
effectiveness of the external occulter does not vary dramatically over the range of 
occulter distances applicable to a single spacecraft (0.5-20 m): less than a factor of 
10 for a 5-mm (radius) entrance aperture and less than a factor of 2 for a 50-mm 
entrance aperture. 

In practical implementation, the effectiveness of the EO can be increased signif- 
icantly by using more than a simple opaque disk as the occulter. Examples include 
a serrated single disk;4° two or more disks arranged along the optical axis so that 
each disk both shadows the next disk from direct sunlight and blocks some of 
the light that diffracts around the previous disk;**4° a threaded barrel or trun- 
cated cone that approximates a many-disk system with a continuous structure;?* 4” 
or an unthreaded (rough or smooth) structure that could potentially have a free- 
form shape. Until recently, the design of external occulters was guided mainly by 
experience and experiment. Reference 48 discusses the history, theoretical analysis, 
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Fig. 8. Irradiance integrated over an entrance aperture of radius 5-mm (left) or 50-mm (right), in 


units of the unocculted value, as a function of external occulter distance. Dashed curve: 4/(1?F). 
Dots (top to bottom): over-occulting factor 1.0, 1.03, 1.05, 1.1. Wavelength: 500 nm. 


4Jn solar diffraction computations, integrating over the solar disk is more laborious than a point- 
source calculation but necessary for accurate results. 
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Fig. 9. Contour map of residual diffracted sunlight expected at 1.3 Reo for the proposed ASPIHCS 
space coronagraph.!® The vertical scale spans 0.88-1.0 in units of the radius of the entrance pupil 
image; the horizontal scale spans 1.005—1.04 in units of the radius of the external occulter image. 
The intensity scale is normalized to the mean intensity of the solar disk. 


and performance testing of compound external occulters and includes examples of 
numerically optimized configurations. 


4.3. End-To-End Diffractive Analysis 


The schematic description of a coronagraph in Sec. 4.1 does not quantify at least 
two important aspects of diffractive performance (in addition to the characteristics 
of the external occulter, discussed above): the radius of the internal occulter (>1 in 
units of the radius of the external occulter image) and the radius of the Lyot stop 
(<1 in units of the entrance pupil image). End-to-end diffractive analysis can be 
accomplished using Fourier optics.?”*° Figure 9 illustrates the combined effects of 
the EO and the Lyot stop on residual diffracted sunlight for the ASPIICS orbiting 
coronagraph currently in development.'*:49 


5. Some Practical Considerations 


5.1. Nondiffractive Stray Light 


As indicated in Sec. 2, it is difficult to generalize about controlling nondiffractive 
stray light because of the range of potential sources external and internal to the 
coronagraph. The primary external source is always the solar disk. In space, sec- 
ondary external sources may include earthshine or spacecraft structures such as solar 
panels. On the ground, the sky (solar light scattered by gases and particulates in the 
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Fig. 10. The effect of a single 12 wm dust particle on stray light in the METIS coronagraph 
for the Solar Orbiter mission.°? (a) The contaminated external occulter as seen off-axis from the 
position of the objective mirror of the coronagraph. (b) Micrograph of the edge of the occulter 
showing the offending particle with an arrow indicating its effect in (a). (c) Same as (a) with a 
clean occulter. 


atmosphere) is always a limiting external source, even at the best observing sites. 
Internal sources within the instrument field of view include scattering from inclu- 
sions and bubbles within lenses and scattering from particulates and microscopic 
irregularities on optical surfaces. Other internal sources include diffuse or specular 
reflection from structural components of the instrument. Scattering from dust and 
optical imperfections is particularly troublesome because it cannot be eliminated 
from the optical path to focus, hence the need for scrupulous contamination control 
throughout the fabrication and testing of coronagraphs. Figure 10 illustrates the 
significant effect a single dust particle can have on the stray light performance of a 
coronagraph. 

Coronagraphs always include annular baffles along the optical path (Fig. 11). 
If there is no external occulter, the internal occulter may be cone-shaped to send 
direct light from the solar disk into a surrounding light trap.°’ In an externally 


36,51 which is a potential 


occulted system, there is typically a heat rejection mirror, 
source of stray light. 

The quantitative analysis of stray light is carried out with specialized software 
that allows for both sequential (geometrical optics) and nonsequential propagation. 
There are a number of commercial programs (e.g. ASAP®, FRED™, Light Tools®, 
OpticStudio®, TracePro®) as well as at least one open-source program.°” The user 
defines sources and detectors of radiation within an optomechanical environment 
that includes both imaging and nonimaging components. The software launches a 
large number of rays and follows them as they refract, reflect, scatter, and terminate 
according to user-specified properties. As a coronagraph requires very low stray light 
levels in the focal plane, a full analysis typically requires a substantial effort; the 
process of inserting and modifying baffles, traps and coatings is iterative even for an 
experienced analyst. The scattering properties of a surface are characterized by a 
bidirectional scatter distribution function (BSDF).°? Discussion of BSDFs relevant 
to coronagraphs may be found in Refs. 54-56. 
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Fig. 11. Cutaway illustration of the optomechanical layout of the COR2 externally occulted space 
coronagraph.°! 


5.2. Signal-To-Noise Ratio 


The signal from the solar K-corona was discussed in Sec. 3.1. The main equations 
necessary to estimate signal-to-noise ratio (SNR) are the camera equation®” and 
the so-called CCD equation,°® also applicable to CMOS sensors. The camera equa- 
tion relates the radiance R (photon s~tm~?sr~') of the source to the irradiance I 
(photon s~!m~?) in the focal plane: 


I = ntR[(2f#min)~? — (2f#max) 7]; 


where ¢ is the throughput of the optical system (including the quantum efficiency 
of the detector) and f# min and f#max denote the minimum and maximum angular 
subtense, expressed as an f-number, of a system that may have a central obstruction 
such as an external occulter. The CCD equation gives the SNR as 


Nsignal 


SNR = —=_—————————————— 
Neignal + Mpix(1 + (npix/Mbg))(Nog + Np + Nz + G07) 


where Nsignal is the total number of signal photons detected during a single exposure 
from a region comprising Npix pixels, Mpg is the number of pixels used to determine 
the background (e.g. the F-corona, or the atmosphere for a ground-based coron- 
agraph), Npg is the photon count from the background, Np is the detector dark 
current in electrons, Nr is the detector read noise in electrons, G is the detector 
gain in electrons per data number, and of accounts for digitization noise (typically 
of ~ 0.29). Because of structure and time-variability in the background, the SNR 
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derived from this equation must be treated as a design tool rather than a firm 
predictor of system performance. 


5.3. Performance Testing 


The performance testing of solar coronagraph components can often be accom- 
plished in a conventional laboratory of sufficient and consistent cleanliness; the 
degraded stray-light performance illustrated in Fig. 10 was the intentional result 
of only 3 days of exposure in a ISO7/8 (Class 10,000/100,000) environment after 
3 months in an ISO5 (Class 100) cleanroom.*° For spaceborne far or extreme ultra- 
violet coronagraphs, attention must also be paid to molecular contaminants that 
can polymerize when exposed on orbit to solar UV radiation and severely degrade 
the FUV/EUV sensitivity of the instrument.°? 

End-to-end testing of the full system requires a specialized facility.©° ©? A simu- 
lated solar source must be provided, of the correct angular extent and bright enough 
that stray light as weak as 10~5-10~!? of the source radiance can be recorded by 
the detector. Stray light levels, photometric calibration, and image quality are best 
done in vacuum to eliminate scattering by airborne dust and degradation of the 
image by atmospheric turbulence. End-to-end testing poses a particular challenge 
for space-based formation-flying coronagraphs, such as the ASPIICS instrument 
under development for the ESA PROBA-3 mission, in which the occulter space- 
craft is 150 m in front of the detector spacecraft.*® References 64-65 discuss the 
theoretical basis for and practical realization of scaled model instruments that can 
be tested in existing facilities and yet quantitatively capture the diffractive behavior 
of the full-scale system. The absolute radiometric calibration of a coronagraph can 
be established on the ground and tracked on orbit using celestial sources such as 
planets and stars. 
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Phase-mask coronagraphs enable direct, high contrast imaging at small inner 
working angles, down to off-axis separations close to the theoretical diffraction 
limit ~A/D, where A is the observing wavelength, and D the telescope diameter. 
Phase manipulation in the focal plane induces the starlight rejection, therefore 
most phase masks are effectively transparent. To cope with unfriendly telescope 
apertures (large central obscuration, secondary support structures, segmenta- 
tion, etc.), modern phase-mask coronagraphs can also be complemented by phase 
and/or amplitude pupil apodizing masks. In 20 years, phase-mask coronagraphs 
have spread on most 8-meter class ground-based diffraction-limited imagers (VLT- 
NACO, VLT-SPHERE, VLT-VISIR, Keck-NIRC2, LBT-LMIRCAM, Palomar- 
P3K, Subaru-SCExAO), as well as the James Webb Space Telescope. Phase-mask 
coronagraphs, such as the vortex coronagraph, have recently been considered for 
future flagship mission concepts such as HabEx, LUVOIR, and OST. 


1. Limitations of Focal-Plane Amplitude Coronagraphs 


The inner working angle of focal-plane amplitude coronagraphs is generally set by 
the physical size of the occulter. Due to a fundamental property of Fourier trans- 
forms that defines the relationship between the focal-plane mask size and the Lyot 
stop, and thus the key trade-off between inner working angle and optical throughput, 
pure focal-plane amplitude coronagraphs are practically limited to inner working 
angles larger than ~3/D. By design, anything within this angular separation is lost, 
blocked by the opaque mask. One possible solution to overcome this fundamental 
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limitation of the Lyot amplitude coronagraph is to forgo acting on the amplitude, 
drop the occulting mask, and use a phase modulation from a transparent substrate 
instead. 


2. Achromatic Interfero-Coronagraph 


The first proposal to use phase modulation for coronagraphy, as opposed to ampli- 
tude modulation, dates back to the mid-1990s. Reference 1 proposed the concept 
of the “Achromatic Interfero-Coronagraph” (AIC), which is a modified Michelson 
interferometer exploiting pupil rotation, and therefore avoiding the use of a physical 
focal-plane mask.? 4 The beam goes through a focus which provides an achromatic 7 
phase shift to produce self-destructive interference for an on-axis source. The AIC 
initiated a paradigm shift and had appealing properties, such as intrinsic achro- 
maticity and small inner working angle. However, being an interferometer, the AIC 
was extremely sensitive to alignment, vibrations, and low-order aberrations. 

The AIC prompted the coronagraphic community to realize that coronagraphs 
should be regarded as interferometers, where the on-axis coherent starlight could be 
divided spatially and recombined so as to create selective destructive interference 
in the beam train. For nulling interferometry, as well as for phase coronagraphy, the 
quality of this starlight suppression or nulling process is quantified by the so-called 
rejection ratio, R, or its inverse, the null depth, N, a term borrowed from nulling 
interferometry. The latter is given by 


Noga 05- =, (1) 
Imax 

where Jin is the total residual intensity of the destructive output (response with 
the coronagraph) and Imax, the total intensity of the constructive output (response 
without the coronagraph). In order to maximize the rejection ratio, or equivalently 
minimize the null depth, several conditions must be satisfied by the interfering 
wavefields. These conditions translate into tight spatial, spectral, and temporal 
constraints in interferometry as well as in phase coronagraphy. 

For coronagraphs, raw starlight suppression at a given off-axis angular separa- 
tion is related to the null depth, but depends on the exact roll off of the corona- 
graphic point spread function. The latter depends on the entrance aperture pupil 
geometry (overall shape and obscuration), and the power spectral density (PSD) 
of phase and amplitude aberrations accumulated through the optical system. In 
the case of an adaptive optics system, the final PSD includes the atmospheric tur- 
bulence power spectrum multiplied by the transfer function of the adaptive optics 
system. 

If the incoming focal-plane electric field associated with the central star is spa- 
tially divided in two equal sub-waves spatially overlapping in the downstream geo- 
metric pupil area, the null depth only depends on the phase shift and amplitude 
mismatch between them. Assuming the amplitude ratio between the two waves is 
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q(A) = 2, we find the following expression for the null depth: 
(1 = g(A))? + (A)a() 
(1+ q(A))? 


where €(A) = $(A\) — 7 measures the phase error from the nominal null phase 7, 
as a function of wavelength. For example, to obtain a monochromatic null depth of 
10-° assuming a phase error ¢ of 6.300 x 107° radian, the amplitude ratio gq must 
be kept above 0.9995 (an mismatch of 0.05%). Assuming « = 0, q has to remain 
larger than 0.995, i.e. a 0.5% intensity mismatch. 

In the following, we describe the main focal-plane coronagraph designs that had 
a significant impact on the high contrast imaging scene over the past 20 years. 


N() = (2) 


3. Roddier’s Disk Phase Mask 


Reference 5 suggested using a transparent disk phase mask (DPM) at the focal plane 
with a size typically half the diameter of the Airy diffraction pattern A/D, where 4 is 
the wavelength of light and D the telescope diameter. The starlight self-cancellation 
in the relayed geometric pupil area is provided by destructive interference thanks 
to the a phase shift of the mask. The size of the mask is chosen so as to perfectly 
balance the amplitude of the field within and outside the mask. The disk phase-mask 
coronagraph (DPMC), unlike the AIC, still requires a pupil-plane diaphragm (Lyot 
stop) to efficiently remove the diffracted starlight residual. The DPMC is highly 
chromatic® due to its wavelength-dependent dimple size 0.5A/D and the difficulty 
in implementing an achromatic 7 phase shift. 


4. The Dual-Zone Phase Mask 


The dual-zone phase mask’ (DZPM) is a generalization of the disk phase mask opti- 
mized to mitigate the size and phase chromaticity of Roddier’s phase mask. The 
DZPM is designed as a circular phase disk of diameter d, surrounded by an annular 
phase ring of diameter dz (Fig. 1). Both the inner disk and outer ring, with sizes 


DPM DZPM FQPM Vortex 


Fig. 1. Graphical representation of the four main phase-mask coronagraphs. Different shades of 
blue represents different phase shift values. The typical size of the disk phase mask feature is of 
the order of the Airy disk. 
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of the order of the Airy diffraction pattern, introduce two different phase shifts 1 
and ¢2 on the incoming wavefront. These phase steps create destructive interference 
within the downstream pupil area, rejecting the stellar light outside the Lyot stop. 
Like the DPM, the DZPM does not provide full mathematical extinction of the on- 
axis starlight. To improve starlight rejection, the DZPM also requires an amplitude 
pupil apodization with a transmission following a radially symmetric fourth-order 
polynomial function. Laboratory demonstration of the DZPM has reached contrast 
levels of 4 x 107° between 7 and 17 \/D in broadband light (AA/A = 40%), show- 
casing the good achromaticity of the dual-zone phase mask coronagraph.® 


5. Sectorized Phase Masks 


Sectorization of phase-mask coronagraph perfectly solves the size chromaticity of 
radially modulated phase masks such as the DPM and DZPM. The underlying 
principle is to divide the focal plane into discreet azimuthal sectors and apply a 
sequence of well-chosen phase shifts between them. When the star is perfectly cen- 
tered at the intersection of the phase-mask sectors, its complex amplitude is equally 
divided between them, independently of the actual size of the beam. The centering 
constraint translates into tight pointing requirements. 


5.1. The Four-Quadrant Phase Mask 


Reference 9 originally proposed the four-quadrant phase mask (FQPM) coronagraph 
to overcome the size chromaticity of the radial phase masks described above. The 
principle of the FQPM is to divide the focal plane into four equal areas centered on 
the optical axis, with two of them providing a 7-phase shift (Fig. 1). This causes 
a full, mathematically perfect destructive interference to occur inside the relayed 
geometric pupil area. As for other focal-plane coronagraphs, the final image is then 
formed after proper filtering through a classical pupil-plane Lyot stop. 

Due to its simplicity and intrinsic size achromaticity, the FQPM received a 
lot of attention in the early 2000s. After a series of laboratory validations!® 14 
monochromatic and broadband light, and on-sky demonstration on VLT-NACO,'? 
the FQPM was deemed sufficiently mature to be baselined for VLT-SPHERE!® and 
JWST-MIRI.!4 


in 


5.2. The Eight-Octant Phase Mask 


Reference 15 proposed to generalize the FQPM to finer sectorization as a way to 
reduce the sensitivity of the FQPM to low-order aberrations such as tip—tilt and 
finite size of stars partially resolved by giant telescopes. The eight-octant phase mask 
(EOPM) indeed benefits from an quartic (z+) dependency on off-axis pointing error 
and, by extension, finite size of stars. An EOPM manufactured with photonics crys- 
tals was tested on the high contrast imaging testbed (HCIT) at JPL and delivered 
10~® raw contrast over 10% bandwidth in the optical.'® 
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6. The Optical Vortex Coronagraph 


Sectorized phase masks naturally lent themselves to smooth azimuthal phase modu- 
lations. References 17 and 18 proposed the optical vortex coronagraph, with a focal- 
plane phase of the form e””’, where @ is the azimuthal coordinate and I a parameter 
known as the vortex topological charge. The topological charge parameterizes the 
number of 27 radian cycles the impinging complex field is forced to follow on a 
closed 360 degree path around the central singularity. Both Refs. 17 and 18 demon- 
strated the perfect mathematical starlight rejection of charge 2 vortex masks for 
an ideal circularly unobscured pupil, with Ref. 17 extending the analysis to higher 
even charges. The vortex coronagraph has been particularly successful for the same 
reasons as the FQPM. Its small inner working angle, layout simplicity, intrinsic size 
achromaticity, and high throughput makes it an attractive solution for high con- 
trast imagers. The vortex coronagraph is currently in operation at the Palomar,!* 73 
VLT,?4 Subaru, Keck,?° 2” and Large Binocular telescopes.?® It is worth mentioning 
that the optical vortex coronagraph can also be modified to accommodate elliptical 
apertures.79 

Reference 30 reported better than 10~° raw starlight suppression in monochro- 
matic light and ~10~° in a 10% bandwidth of optical white-light over a dark hole 
with size in the range (3-8) xA/D. Both results were obtained on the HCIT at JPL. 


7. An Aside on Phase Shifting Techniques 


Regardless of their geometrical implementation, phase-mask coronagraphs rely on 
phase modulation. The electric field solution of the Maxwell equation has the follow- 
ing general form: E(r,t) = Ae’'-*"), with the wave amplitude A, and the phase 
term ¢ = (¢9 +wt —k-r), containing two terms. The first term contains the tempo- 
ral dependency of the electric field, parameterized by the angular frequency w. The 
second term is the spatial term, parameterized by the wave vector k = (kz, ky, kz), 
with |k| = =. 


7.1. Scalar Phase Shifters 


The term &-r is often known as the optical path delay. It can be manipulated by 
introducing optical path differences between multiple waves. Scalar phase shifters 
leverage this term, introducing a combination of materials and/or delays to generate 
the prescribed phase shift. The DPM and DZPM, and FQPM have been manufac- 
tured using the scalar phase, using simple phase steps in a substrate. 


7.2. Vector Phase Shifters 


The temporal angular frequency term wt is often neglected, but can readily be 
acted upon using the Pancharatnam—Berry or geometrical phase concept.*! In 1955, 
Pancharatnam showed that a cyclic change in the state of polarization of light is 
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accompanied by a phase shift determined by the geometry of the cycle as rep- 
resented on the Poincaré sphere. The geometrical phase is also used to compare 
the phases of two waves in different states of polarization. It is mathematically 
defined as the argument of the inner product of their respective Jones vectors. 
The Jones vector describes the polarization of light in the Jones calculus. Its two 
complex components represent the amplitude and phase of the electric field in the 
x and y directions. The temporal frequency phase term can thus be manipulated 
by actively rotating the polarization vector with polarization devices such as a 
halfwave plate!!!” or polarizers.?? The phase delay introduced by a rotation a 
is a. The vector vortex coronagraph and eight-octant phase mask have primarily 
been manufactured using vector phase shifters. Vector phase shifters are essentially 
space-variant halfwave plates,!!:!7 synthesized out of subwavelength gratings or 
photonics crystals. Space-variant optical devices have also greatly benefited from 


advances in material science and in particular liquid crystal polymers.*4 


8. Sensitivity to Low-Order Aberrations and Stellar Size 


Phase-mask coronagraphs are generally employed in applications requiring small 
inner working angles. The price to pay for their attractive working angles is their 
inherent sensitivity to pointing errors and other low-order aberrations. For instance, 
the starlight leakage of the DPM, the FQPM, and the charge 2 vortex has a 
quadratic dependence to off-axis pointing errors. Higher order masks such as the 
eight-octant and charge 4 vortex have a quartic dependence to off-axis pointing 
errors.®” It can be shown that the sensitivity to pointing errors goes as r', with r 
the angular separation and / the mask order (topological charge for the vortex, and 
sectorized phase-mask coronagraphs). 

While starlight leakage due to the finite size of stars can only be mitigated 
by higher order masks at the price of inner working angle, pointing errors can be 
mitigated by clever pointing loops. The pointing control system can use the light 
from the science camera itself?®** or dedicated low-order wavefront sensors using 
the light rejected at the Lyot stop.°® 


9. Multi-Stage Phase Mask 


Multi-stage coronagraphy has been proposed for various coronagraph designs*?-*° 


as a practical way to mitigate sensitivity to unfriendly apertures, chromaticity and 
low-order aberrations. For these reasons, cascading phase-mask coronagraphs can 
be very effective in overcoming the most tenacious limitations of phase-mask coro- 
nagraph. The first, and to our knowledge only multi-stage phase-mask to be opera- 
tional on-sky is the Palomar 200-inch Stellar Double Coronagraph.?? The SDC uses 
two charge 2 vortex coronagraphs in cascade along with a coronagraph pointing 
sensor. 


10. 
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Towards Hybrid Phase Masks 


Coronagraphy is undergoing a dramatic change, as other chapters of this volume 
will demonstrate. As hinted early on by the DZPM’s need of a pupil apodizer, 
hybridization of phase focal-plane mask with complex pupil apodizers is now rec- 
ognized to be particularly effective at compensating for the detrimental effect of 


unfriendly apertures. 


24,44, 45 
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Unlike in a classical Lyot coronagraph, where the starlight is blocked at the image 
plane to remove it entirely from the optical system, the principle behind pupil- 
plane coronagraphy is to reshape the PSF so that it contains regions of high 
contrast. In amplitude coronagraphs, this is done by changing the magnitude of 
the electric field as it passes through the entrance pupil of the system. This can be 
done through apodization, where the transmission is a smooth function of position 
in the pupil, or through an actual change in shape of the aperture, what is called 
a binary transmission function, so that the resulting point spread function is no 
longer the Airy function associated with an open circular aperture but rather 
reflects the more intricate open areas of the pupil. In this chapter, we describe 
the mathematics behind pupil-plane amplitude coronagraphy and describe various 
optimal approaches to designing apodized and shaped-pupil coronagraphs. More 
background detail can be found in Refs. 1-6. 


Statement of the Problem 


We consider the simplest optical system consisting of an entrance pupil described 
by a set S with an apodization function A(x, y) representing the transmission of 
the pupil, where (a, y) are coordinates across the pupil and A(, y) takes on values 


between 0 and 1. Thus, for a square aperture, for example, S would be given by 


S={(#,y):—-D/2 <a < D/2,-D/2<y < D/2} 
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and for a circular aperture S would be given by 


S={(t,y): Var +y? < D/2}, 


where D is the width/diameter of the aperture. 

Assuming a uniform electric field of wavelength 1 and amplitude Eo at the 
pupil followed by a lens of focal length f, and using only scalar diffraction theory 
(ignoring polarization effects), the complex-valued electric field at the image is given 
by the Fraunhofer integral over S, which is just a Fourier transform: 


LE it 
OS ohh ~ FEO AC, y)dady, (1) 


where (€,¢) are coordinates in the image.”® The point spread function (PSF) is 
then defined as the magnitude squared of the image-plane field: 


2 


ak ~5F E+Y9) A(a, y)dedy| (2) 


The high contrast imaging problem is to find an apodization function A(z, y) 
that produces a point spread function, P(€,¢), with the property that certain regions 
of the image plane, called dark holes, have a specified high contrast relative to the 
peak at P(0,0). Since A(a,y) can only attenuate, it is a property of all apodized 
pupil coronagraphs that the throughput of the system is reduced. We can thus for- 
mulate the problem as an optimization that chooses an A with maximum through- 
put under a constraint of high contrast in the largest possible region of the image 
with a given inner working angle. The inner working angle of a coronagraph is the 
smallest angle on the sky at which high contrast is achieved. For the apodized pupil 
coronagraphs described here, it is the location in the image plane closest to the 
center where the PSF still exhibits high contrast. 

Because we will be looking at various geometries, it is useful to find the form of 
the Fraunhofer integral for a few special cases. The first is a square aperture, where 
we assume that the apodization function is separable as the tensor product of two 
one-dimensional functions, A(x, y) = Az(x)A,(y). If in addition the two factors are 
even functions (i.e. A,;(—a2) = A,(x) and A,(—y) = A,(y)), then the Fraunhofer 
integral in Eq. (1) becomes 


PIE, 


ea 


AM Jo 
A special case of the separable apodization is the purely one-dimensional apodiza- 
tion, A(x, y) = A(x). In that case, the field in Eq. (3) simplifies to 


D/2 
E(é,0) = A, (a) cos(2réx)da | A,(y)cos(2nCy)dy. (3) 


iE 2 2sin(m¢) 
ag a an 


For a circular aperture, we can change to polar coordinates (1, @) in the pupil. If 


D/2 
E(€,¢) = ) | A(a) cos(27&a)dx. (4) 


we similarly assume an apodization that is only a function of r, then the Faunhoffer 
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integral can be completed around the azimuthal direction, 6, resulting in the 
well-known Hankel transform”: 


oP 27rp : 
Boo = Fp [Aledo (APE) var. . 


where Jo is the zero-order Bessel function of the first kind and E(p) is also circularly 
symmetric with coordinate p = \/€? + ¢? in the image plane. 


2. Optimal Apodized Pupil Coronagraphs 


In this section, we describe smooth apodization functions that can be applied to a 
telescope to provide high contrast. This problem is not a new one. A comprehensive 
survey of apodization functions for various applications can be found in Ref. 9. More 
recently, apodization functions targeting high contrast were described in Refs. 10 
and 11. In 2001, Ref. 12 proposed an apodized square aperture (ASA) where the 
tensor product is taken of two apodization functions as in Eq. (3); the resulting PSF 
has areas of very high contrast along the diagonals. Figure 1 shows the resulting 
PSF of an ASA apodized with a Sonine function (as proposed by Nisenson) given 
by 


Az(a) = (1 — 427)""?, (6) 


where 3 < vy < 5. A larger v creates a smaller inner working angle at the expense 
of throughput. 

In a landmark 1961 paper, Ref. 13 found the optimal compactly supported 
function that maximally concentrates energy in the frequency domain under a finite 
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Fig. 1. Left: The PSF of an apodized square aperture using a Sonine apodization with v = 5 
plotted on a logarithmic scale with black a factor of 10~!° of the brightest pixel. Right: A cross- 
section of an ASA with the diagonal showing an inner working angle of 5 A/D, with D the width 
of the aperture, for a contrast below 10719. 
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Fig. 2. Left: The one-dimensional prolate spheroidal apodization function. Center: The corre- 
sponding PSF plotted on a logarithmic scale. Right: A cross-section of the PSF showing an inner 
working angle of 4 \/D for a contrast below 10719. 
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Fig. 3. Left: The PSF of the generalized prolate spheroidal apodization plotted on a logarithmic 
scale. Right: A cross-section of the PSF showing an inner working angle of 3.5 A/D for a contrast 
below 10710 


Fourier transform. This function, the zero-order prolate spheroidal wave function, 
is the optimal 1D apodization that maximizes the contrast under a throughput 
constraint. In fact, in a later 1965 paper, Ref. 14 showed how this function can be 
used to apodize a square aperture in one dimension. Figure 2 shows the resulting 
PSF and its corresponding cross-section. 

Note that these one-dimensional apodization functions result in only limited 
regions of the image plane with high contrast. The apodizer must be rotated and 
multiple exposures taken for a complete search. Reference 14 introduced the gener- 
alized prolate spheroidal wave function, which is maximally compact under the finite 
Hankel transform in polar coordinates (Eq. (5)). The resulting PSF and cross-section 
are shown in Fig. 3. While this has lower throughput than the one-dimensional 
apodization, it is more efficient for searching the entire 360 degree region of the 
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image plane. This implies that the generalized prolate apodization might be best 
used for discovery, but a one-dimensional apodization could be more efficient for 
spectroscopy where the object location is known. 

While apodized pupils for high contrast coronagraphy have been known for 
many years, they have not seen much application due to the challenges associated 
with manufacturing them. It is extremely difficult to produce an apodizer with 
a precise amplitude variation over broadbands of wavelengths and with no phase 
shifts. As a result, we turned to a special class of apodizations where the function 
A(x, y) is binary, taking on values of either 0 or 1, which have been dubbed shaped 
pupils. It is much easier to manufacture a shaped pupil with high precision than it is 
to manufacture an apodization with a similar level of precision. These are described 
in the following section. 


3. Shaped Pupil Coronagraphs 


In 2000, Ref. 1 introduced a binary mask design for high-contrast coronagraphy by 
noting that a one-dimensional apodization function, such as the prolate spheroidal 
function shown in Fig. 2, could be used in Eq. (4) to define the vertical extent of 
an opening as a function of the horizontal position and that such a pupil would 
have high contrast along the horizontal axis of the image plane (see Fig. 4). This 
example provided the inspiration and incentive to look for other binary pupils that 
would have a large region of high contrast, high throughput, small inner working 
angle, and high level of manufacturability. 

The first such masks were found by noting that the very narrow dark region in 
Fig. 4 could be enlarged by increasing the degrees of freedom through the introduc- 
tion of multiple openings in Eq. (4). Optimization tools can then be used to find 


PSF for Single Prolate Spheroidal Pupil 


Fig. 4. Left: A pupil based on the prolate spheroidal function. Right: The corresponding PSF 
plotted on a logarithmic scale with black corresponding to a 101° contrast. 
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Fig. 5. Left: A self-supporting shaped pupil mask designed for an elliptical aperture. Right: The 


PSF plotted on a logarithmic scale showing an inner working angle of 4 A/D for a contrast of 
190~—10 3 


the shape of each opening, still exploiting the one-dimensional symmetry. Such a 
mask (called a ripple mask) is shown in Fig. 5.3 It has a relatively high throughput 
and a relatively small inner-working angle but the size of the dark zone close to the 
star is rather small. 

These results motivated the use of optimization techniques!® more directly to 
find high-contrast apodizations, in ways that exploit other symmetries. The binary 
nature naturally leads to a comparison with optimal control problems, where solu- 
tions often have the so-called bang-bang property. For example, the minimal fuel 
solution to the problem of launching a rocket from Earth and putting it into orbit 
around the Moon involves firing the thrusters with full force during certain parts of 
the flight and turning them off for the other parts. We are fortunate to have discov- 
ered that the same bang-bang effect happens when designing optimal apodizations. 
In other words, optimized apodizations often result in functions A(z,y) that are 
either one or zero everywhere; i.e. shaped pupils. 

There are various ways one can imagine “optimizing” the design, since we are 
faced with multiple objectives: we'd like to maximize light throughput but we’d also 
like to maximize contrast in a dark zone and we'd like the dark zone to be as close 
to the star as possible and have the largest possible angular (and radial) extent. 
So, we have at least four “objectives”. To date, most optimization-based designs 
have been produced by maximizing light throughput subject to the constraint that 
the contrast is sufficiently high in a certain specified “dark zone” given by the 
set D: 


maximize // A(x, y)dady 
Ss 
subject to |E(é,n)| <¢ E(0,0), (én) €D, 
0<A(z,y) <1, (x,y) €S, 


A(x, y) = 0, xy) ES. 
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This optimization problem is defined over a 2D continuum. It can be made into a 
tractible problem by discretizing the continuum of points in the pupil plane S$ and in 
the dark zone D of the image plane. The problem is also nonlinear since the Fourier 
transform is, in general, complex valued and so the absolute value is the square 
root of the sum of the squares of the real and imaginary parts. However, we can 
impose symmetry constraints in both axes of the pupil, i.e. A(—z,y) = A(z,y) = 
A(x, —y) = A(—2, —y), and then the Fourier transform becomes a real-valued cosine 
transform 


E(€,¢) = x} If. cos(27&x) A(x, y) cos(27Cy)dady 
and the problem can be rewritten as a linear programming problem: 
maximize (0,0) 
subject to —e E(0,0) < E(é,n) <e¢ E(0,0), (e,7) € D, 
0< A(a,y) <1, (x,y) € S, 
A(x, y) = 0, (x,y) € S. 


(Note that we have also simplified the objective function by exploiting the fact 
that £(0,0) is proportional to the throughput.) It turns out that solutions to this 
optimization problem are always of the bang-bang type. In other words, the solution 
is a shaped pupil mask. 

One way to quantify the performance of the shaped pupil is to calculate the 
fraction of energy concentrated in the PSF core of a hypothetical off-axis point 
source. If E,(€,7) is the scalar image-plane field associated with the point source, 
then we can define the core PSF region, C’, as the set of points in the image plane 
where the intensity is greater than or equal to half of its maximum: 


C= {(E.n) Ep (Gl? > 5 mare (LEE HP). 


We then define the PSF core throughput, T.., as the ratio between the energy in the 
core and the energy transmitted by the entrance pupil before the shaped pupil: 


_ J fs Egdady : 


As before in Eq. (2), Ho is the uniform field amplitude at the entrance pupil. 
Figure 6 shows a pupil mask that is analogous to the one-dimensional prolate 


(7) 


spheroidal apodization shown in Fig. 2 (and in fact approximates it both in average 
throughput across the mask and in Fourier space within the spatial frequencies 
corresponding to the prescribed dark hole). This pupil mask is called a barcode mask 
for obvious reasons.* We also show the analog of the ASA as the tensor product 


Fig. 6. Top Left: A barcode mask.? Top Right: The associated PSF plotted on a logarithmic 
scale.? Bottom Left: A checkerboard mask.* Bottom Right: The associated PSF plotted on a 
logarithmic scale.* 


° 


of two barcode masks in the second row of Fig. 6. This mask is naturally called a 
checkerboard mask.* 

The optimization problem for the azimuthally symmetric apodization produces 
a bang-bang mask that consists of concentric rings. This is an analogue to, and 
approximation of, the two-dimensional prolate spheroidal apodization in Fig. 3. 
One such mask is shown in Fig. 7.° 

Unfortunately, the concentric ring mask is not “self-supporting”. In other words, 
the rings must be deposited on glass or inverted to make a reflective shaped pupil.'® 
Adding a glass plate adds an optical element to the design and therefore introduces 
certain possibly challenging manufacturing/design issues. This inspired an alter- 
native approach in which a smoothness constraint was added to the optimization 
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Fig. 7. Left: A concentric-ring pupil mask.° Middle: The PSF plotted on a logarithmic scale.° 
Right: Plot of contrast as a function of radius showing an inner working angle of 3.5 \/D.° 


Fig. 8. Left: A star-shaped pupil mask.® Middle: The PSF plotted on a logarithmic scale for the 
20-petal mask shown.® Right: The PSF for a 150-petal version of the mask.® 


problem so that the result is a smooth (not bang-bang) apodization profile that is 
a function of radius r alone. Such an apodization can then be petalized to make 
a mask whose opening is star-shaped. Such an example is shown in Fig. 8.° This 
design is simple and elegant but it requires a large number of petals in order to 
provide a large dark zone. These star-shaped masks were the original inspiration 
for the design of star-shaped occulters.!7 

All of the previous designs were found by exploiting some sort of symmetry that 
reduced the full two-dimensional Fourier transform to a one-dimensional transform 
of some sort. This approach was taken because the full two-dimensional discretized 
optimization problem was intractable at the level of discretization required to design 
a meaningful mask. However, it was eventually realized'® that one can adopt the 
same multi-step approach that led to the fast Fourier transform to this context of 
having a Fourier transform appear as constraints in a linear optimization problem. 
This innovation made it possible to develop very general optimal designs.!? Figure 9 
shows one such design. 
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Fig. 9. Left: A shaped pupil mask.!® Right: The PSF plotted on a logarithmic scale showing an 
inner working angle of 4 \/D for a contrast of 10~19.19 
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Fig. 10. Left: A shaped pupil mask designed to accommodate central obstructions and spiders. 
Right: The PSF plotted on a logarithmic scale showing an inner working angle of 4 A/D and a 
contrast of 10~8 in two square-shaped dark holes.!9 


As one last example, we show in Fig. 10 a mask design that was made to 
accommodate a telescope with a large central obstruction and complex spiders, 
showing the versatility of the pure two-dimensional design approach. 


4. Shaped Pupil Lyot Coronagraphs 


A shaped pupil apodizer can be combined with a classical Lyot coronagraph mask 
train to produce a hybrid coronagraph design with enhanced performance.” 7! The 
added layers of on-axis rejection enabled by the focal-plane occulter and Lyot stop 
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Fig. 11. Shaped pupil Lyot coronagraph for an unobscured circular aperture. The mask train 
consists of an optimized shaped pupil apodizer (a), followed by an opaque occulting spot (b) at 
the first focus, and an annular Lyot stop (c) in the reimaged pupil plane. The result is a region of 
high contrast at the final focus (d). This particular design meets a contrast goal of 10~!° over an 
annular dark zone spanning angular separations 3.0 to 12.0 A9/D, over a 10% bandpass. 


result in a design that can reach a smaller inner working angle than would be 
possible with a conventional shaped pupil, while producing an off-axis PSF with a 
higher throughput and a more tightly concentrated core. 

An example shaped pupil Lyot coronagraph (SPLC) is illustrated in Fig. 11. 
This design creates an annular 10~1° contrast dark zone at angular separations 3 
to 12\/D. It has a PSF core throughput (as defined in Sec. 3) of 23% (versus 85% 
throughput into the core of the Airy disk PSF of a circular aperture), and the half- 
max core area of the off-axis PSF is 1.5 times that of an Airy disk. Similar to the 
shaped pupils shown in Sec. 3, the SPLC can be optimized for obscured apertures, 
and the dark zone can be spatially restricted in separation or azimuth to trade for 
increased throughput. 

One consequence of introducing the focal-plane occulter (plane b in Fig. 11) 
is that the PSF no longer scales linearly in space with wavelength (i.e. Eq. (2)). 
Instead, the optimization program must constrain the intensity independently at 
multiple wavelength samples spanning the observation bandpass. This can be advan- 
tageous for spectroscopy: it is then possible to anchor the dark zone constraints 
in true offset coordinates, rather than wavelength-proportional (A/D) coordinates, 
so that over the instrument bandpass the inner edge of the dark zone remains 
fixed. 

A second consequence of the focal-plane occulter is that the dark zone becomes 
more sensitive to wavefront aberrations. The fabrication and alignment tolerances 
associated with the added masks further complicate implementation. Nevertheless, 
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SPLC designs have been used in several successful technology demonstrations for 
the coronagraph instrument planned for NASA’s WFIRST mission.?? 
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Phase apodization coronagraphs are implemented in a pupil plane to create a dark 
hole in the science camera focal plane. They are successfully created as “Apodiz- 
ing Phase Plates” (APPs) using classical optical manufacturing, and as “vector- 
APPs” using liquid-crystal patterning with essentially achromatic performance. 
This type of coronagraph currently delivers excellent broadband contrast (~10~°) 
at small angular separations (few A/D) at ground-based telescopes, owing to their 
insensitivity to tip/tilt errors. 


1. Introduction 


Pupil-plane apodization techniques (amplitude, phase, or complex) differ from focal 
plane-mask coronagraphs in that they affect all objects in the field in an identical 
fashion. The main goal of such pupil-plane coronagraphs is to enforce dark holes in 
the ensuing point spread function (PSF) in which faint companions can be directly 
detected and characterized. Since the star and companion have the same PSF, the 
halo should be suppressed while preserving the starlight in the core as much as 
possible, i.e. a high Strehl ratio PSF. In this situation, the “noise” is governed by 
the PSF diffraction halo plus any diffuse background, while the “signal” is contained 
in the PSF core. 

The phase-only “Apodizing Phase Plate”!* (APP) coronagraphs have now 
been successfully applied on-sky at ground-based telescopes. The main benefits of 
APPs include a high contrast inside the dark hole (~10~4-10~°), at a small inner 
working angle ~1.5A/D, with complete insensitivity to tip/tilt errors (and partially 
resolved stellar disks) that usually limit focal-plane coronagraphs. This invariance 
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of the PSF additionally enables beam-switching for thermal background removal, 
and observations of multiple star systems. With the introduction of advanced liquid- 
crystal technology for the vector-APP coronagraph,° ‘ it has also become efficient 
over spectral bandwidths of more than an octave, at wavelengths from 300 to 
30,000 nm.® The extreme phase patterns enabled by liquid-crystal writing techniques 
can now also produce dark holes with various shapes, including complementary 
180° D-shaped dark holes and 360° donut-shaped dark holes. As a single pupil- 
plane optic, the (vector-)APP is easily implemented in a filter wheel in existing 
instruments, and is fully compatible with cryovacuum (and likely also space-based) 
operation. 


2. Theory 


The one-dimensional apodization problem has been studied for a long time, includ- 
ing slit apodization in spectroscopy and pulse shaping to reduce channel bandwidth 
in telegraphy, by apodizing in amplitude.® The family of functions to describe this 
are the Slepian functions and the Prolate Spheroidal wavefunction.!? Since trans- 
mission apodization is linear, it can achieve a high degree of suppression between the 
PSF and the halo beyond a selected inner working angle (IWA), and in general the 
apodizations are complex with both transmission and phase. The accurate manu- 
facture of complex amplitude masks is nontrivial and can result in low transmission 
efficiencies. 

Phase-only apodization theory was initially developed for removing speckles 
generated by residual optical aberrations in high contrast imaging experiments,!! 
where wavefront sensing in the final focal plane of a coronagraph forms a closed 
loop with a deformable mirror (DM) in the optical system. A sinusoidal ripple on 
the DM forms a diffraction grating in the phase of the wavefront, generating a pair 
of speckles that are copies of the Airy core of the central PSF. The appropriate 
choice of spatial phase and amplitude of the ripple applied to the DM destructively 
interferes with speckles generated by aberrations in the optical system. The same 
principle can be generalized to cancel out the diffraction rings of the PSF itself, 
as demonstrated on-sky by the addition of coma into an adaptive optics system to 
cancel out part of the first Airy ring.!* Apodization in phase over a two-dimensional 
region does not yet have an analytic solution. Superposing many different phase rip- 
ples in the pupil plane to suppress the diffraction pattern over a region of interest 
(ROI — typically defined as a D-shaped region next to the Airy core of the PSF) is 
challenging, since the speckles add vectorially and interfere with each other, making 
it a nonlinear problem. Reference 13 searched for phase-only apodization solutions 
through a modal basis approach. An ROI is defined in a complex amplitude focal 
plane, where the diffraction halo is to be minimized. A complex amplitude field is 
defined in the pupil plane, and a Fourier imaging operator is defined that maps 
from the pupil plane into the ROI. Singular Value Decomposition of this opera- 
tor produces a modal basis set of complex pupil amplitudes, ordered canonically 


Pupil-Plane Phase Apodization 379 


from the most power contained within the ROI to the least. These modes typically 
have complex amplitudes in the pupil plane, so their complex amplitude is nor- 
malized to unity to make them phase-only apodization. These “antihalo” modes 
are subtracted off the complex amplitude of the pupil plane, and the process is 
repeated. The antihalo modes extend a short distance beyond the ROI, and if the 
IWA is within the first Airy ring, flux from the core of the PSF is detrimentally 
removed as well. Care is needed to suppress these modes by imposing additional 
constraints to maximize the PSF core encircled energy. If not properly accounted for, 
phase wrapping can also occur when the peak-to-valley phase apodization is greater 
than 27. 

New algorithms have been developed at Leiden Observatory by Doelman, Keller 
and Por. Doelman generates focal plane dark zones using a combination of phase- 
only pupil modes.'4 A simulated annealing approach is used, where the mode ampli- 
tudes are randomly adjusted. Solutions that improve the dark region are kept, but 
worse solutions are occasionally accepted as well to escape local minima. Keller uses 
a Gerchberg-Saxton! method, switching between the pupil plane and focal plane. 
Convergence to a given contrast level is increased by an order of magnitude using 
Douglas-Rachford operator splitting.!® Por!” generalizes an algorithm by Carlotti!® 
for general complex amplitudes in the pupil plane. Strehl ratio maximization for this 
mask is a linear operation solved by large-scale optimizer, and phase-only solutions 
are naturally found through this approach. 


3. First Generation APPs Using Classical Phase 


The manufacture of APP solutions requires the variation of phase across the pupil 
plane of the camera, and the development of free-form optic manufacture with 
notable departures from sphericity using computer-controlled diamond turning!® 
encoded the phase patterns as variations in the thickness of a high refractive index 
transmissive substrate. First light observations of an APP with diamond turned 
optics! demonstrated the viability of the manufacturing technique and of the the- 
ory. The success of the prototype led to APP coronagraphs on the 6.5m MMTO 
telescope in Arizona? and on the Very Large Telescope in Chile.*** The VLT APP 
led to the first coronagraphic image of the extrasolar planet 3 Pictoris b?° and the 
discovery of the extrasolar planet HD 100546b.?4 

Diamond turning only allows for low spatial frequencies in the azimuthal direc- 
tion of the cutting tip, and the classical phase plate manufacturing was inherently 
chromatic. Attempts to achromatize the APP using doublets proved highly chal- 
lenging.?? 


4. The Vector-APP 


The main limitations of the APP coronagraph (chromaticity, limited coverage 
around the star, limited phase pattern accuracy) were solved by the introduction of 
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the vector-APP (vAPP).° In a similar way as for the Vector Vortex Coronagraph,7? 
the vAPP replaces the classical phase pattern (¢.[u, v] = n(A)Ad[u, v]) with the 
so-called Pancharatnam?*—Berry?° phase or “geometric phase” .2° The vAPP phase 
pattern is imposed by a half-wave retarder with a patterned fast axis orientation 
O[u, v]. The geometric phase is imprinted on incident beams decomposed according 


to circular polarization state: dg[u,v] = +2 - O[u,v], with the sign depending on 
the circular polarization handedness. As this fast axis orientation pattern does not 
vary as a function of wavelength (with the possible exception of an inconsequential 
offset /piston term), the geometric phase is strictly achromatic. A simple Fraunhofer 
propagation from the pupil [u,v] to the focal plane [x,y] shows that after splitting 
circular polarization states the two ensuing coronagraphic PSFs are point-symmetric 
(PSFL[z, y] = PSFr[—z,—y]), and therefore, in the case of D-shaped dark holes, 
delivers complementary PSFs that furnish instantaneous 360° search space around 
each star. 

Vector-APP devices are produced by applying two breakthrough liquid-crystal 
techniques: any desired phase pattern is applied onto a substrate glass through 
a direct-write procedure?" that applies the orientation pattern 6[u,v] by locally 
polymerizing the alignment layer material in the direction set by the controllable 
polarization of a scanning UV laser. Consecutively, birefringent liquid-crystal layers 
are deposited on top of this alignment layer. Several self-aligning layers (“Multi- 
Twist Retarders”; MTR?°) with predetermined parameters (birefringence disper- 
sion, thickness, nematic twist) yield a linear retardance that is close to half-wave 
over the specified wavelength range. The vAPP can become efficient over a large 
wavelength range (up to more than an octave), while any phase pattern can be 
written with high accuracy. 


4.1. Prototyping and First On-sky Results 


The first broadband vAPP device was fully characterized in the lab at visible wave- 
lengths (500-900nm).° The main limitation of the contrast performance inside the 
dark hole was the occurrence of leakage terms that produced a faint copy of the 
regular PSF on top of the coronagraphic PSFs. These leakage terms are caused 
by small offsets to the half-wave retardance of the vAPP device, and offsets from 
quarter-wave retardance of the quarter-wave plate that, together with a Wollaston 
prism, accomplishes the (broadband) circular polarization splitting. This issue was 
resolved with the introduction of the “grating-vAPP” ,?° which implements the cir- 
cular polarization splitting by superimposing a tilt (i.e. a “polarization grating” ®) 
pattern on top of the coronagraphic pupil phase pattern, which, by virtue of the 
properties of the geometric phase, very efficiently sends the coronagraphic PSFs into 
grating orders +1, and leaves all the leakage terms in the zeroth order. The grating- 
vAPP also greatly simplifies the optical configuration, as all the manipulation takes 
place within one single (flat) optic. The coronagraphic PSFs are now subject to 
a lateral grating dispersion term and so the grating-vAPP can only be used in 
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Fig. 1. Phase patterns, theoretical and on-sky PSF's (logarithmic scale) for the two vAPP devices 
installed at MagAO. (a) Theoretical phase pattern for a 180° dark hole covering 2—7 A/D, (b) 
the ensuing theoretical PSF, (c) the on-sky PSFs at MagAO for the star 7 Crucis at 3.9m. 
(d) Theoretical phase pattern for a 360° dark hole covering 3-7 A/D, (e) the ensuing theoretical 
PSF, (f) the on-sky PSFs at MagAO for the binary star 3 Centauri at 3.9 4m. Phase pattern 
designs by Christoph Keller. Data processing by Gilles Otten.” 


combination with narrowband filters, although the wavelength range throughout 
which these filters can be applied can still be very large. 

The first grating-vAPP successfully demonstrated on-sky was installed at the 
MagAO/Clio instrument attached to the 6.5m Magellan-Clay telescope in Chile” 
(Fig. lfa-c)). The device was designed and built to operate from 2-5 zm, covering 
the infrared atmospheric K, L and M-bands. The first-light observations demon- 
strated excellent suppression of the stellar diffraction halo in the complementary 
dark holes (see Fig. 1/c)). Detailed analysis of the data demonstrated a 5-0 con- 
trast for point source detection of ~10~° at 2.5-7\/D." The contrast performance 
is greatly enhanced by combining the two complementary dark holes through a 
simple rotation-subtraction procedure to further suppress the wind-driven starlight 
halo in the dark holes, which is caused by finite AO loop speed. Figure 1({c) shows 
the presence of the leakage term PSF in between the coronagraphic PSFs, which 
can be used as an astrometric and photometric reference, in the (frequent) case that 
the coronagraphic PSF cores are saturated. 


4.2. 360 Degree APP Solutions 


As part of the algorithm exploration of the APP surface, a family of functions 
was found that showed 360 degrees of suppression around the central star. These 
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solutions have lower Strehl ratios for the star (typically 20-40%) with larger TWA 
compared to the 180° dark holes, and these phase pattern solutions are pathological 
in nature, with rapid phase changes over small scales. The advent of liquid-crystal 
patterning encouraged us to revisit these 360° solutions, and test them in the lab and 
on-sky. Figure 1{d-f) show the phase pattern and ensuing PSFs for the experimental 
vAPP device at MagAO. The lower row of figures shows that the liquid-crystal 
manufacture successfully reproduces the complex phase pattern, and this on-sky 
image (Fig. 1(f)) shows a fainter binary stellar companion to the right of the primary 
star’s PSF. 


5. Future Directions 


Our team is currently installing different vAPP coronagraphs at several instruments 
at large telescopes around the world, and working on novel designs for the future 
extremely large telescopes. Foreseeable future developments of the vector-APP as a 
separate optical component, and as integral part of a high-contrast imaging system 
include: 


e The combination of several grating layers in a “double-grating-vAPP” to recom- 
bine the two coronagraphic PSFs with 360° dark holes to feed an integral-field 
unit while rejecting the leakage terms. 

e By prescribing a specific retardance profile as a function of wavelength, it is 
possible to build a wavelength-selective vAPP device, that operates as a regular 
vAPP coronagraph at the science wavelengths, and acts like a regular glass plate 
at the spectral range of a wavefront sensor behind it. 

e The pupil phase manipulation of the vAPP can be extended by amplitude manip- 
ulation in the pupil to create complex apodizers,!® and by phase/amplitude masks 
in the focal plane to yield hybrid coronagraphy.*° 

e As this technology is likely compatible with operation in space, it is opportune to 
characterize the performance of vAPP-like coronagraphs at the extreme contrast 
levels (~10~°) of space-based high-contrast imaging. 

e To adapt the vAPP phase pattern to the observational needs, the observing con- 
ditions, and segmented pupils with variable configurations, active liquid-crystal 
devices will be developed to establish “adaptive coronagraphy” . Such a system can 
then deliver dark holes of various geometry and depth, depending on whether the 
observer is interested in detecting exoplanets or characterizing known targets. 

e As the vAPP relies on polarization splitting, it is possible to design an optimal 
system for coronagraphic polarimetry,*' particularly with the 360°-designs. 

e The fact that the vAPP produces several PSFs for the same star at the focal 
plane makes it an attractive option for implementing focal-plane wavefront sens- 
ing, for instance through phase-diversity techniques. Another promising approach 
involves the incorporation of an additional pupil phase pattern which generates 
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pairs of PSF copies around the main PSFs, with each pair encoding a wavefront 


error mode through an intensity difference.*” 
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The ultraviolet region of the spectrum provides unique astronomical diagnos- 
tics, as well as unique challenges in its instrumentation. First and foremost, it 
requires operation from space, as the atmosphere completely absorbs ultraviolet 
wavelengths. Ultraviolet photons react very readily with atoms and molecules, 
requiring specialized approaches for ultraviolet optics and detectors, and mak- 
ing contamination control of paramount importance. This chapter will describe 
the special requirements of the ultraviolet, and the currents state of the art for 
addressing those issues. Because of the limitations of the current state of the art, 
the possibility for significant improvements in science return for future missions 
is great if improved technologies can be employed for optical components and 
detectors. 


1. Introduction 


Astronomical instrumentation in the ultraviolet is, in the broad sense, analogous to 
instrumentation in the optical region of the spectrum. Telescopes consist of one or 
more mirrors that match or closely approximate conic sections. Light is imaged onto 
detectors operating through the photon’s quantum interaction with an absorbing 
material. Spectra are created through dispersion by refractive optics for low resolu- 
tion and diffraction gratings for high resolution. However, the shorter wavelengths of 
the ultraviolet impose differing requirements and design drivers than in the optical. 
This chapter will review these differences, and highlight the different approaches 
that are required for ultraviolet astronomy. For purposes of this discussion, the fol- 
lowing definitions will be utilized to describe the regions of the ultraviolet spectrum: 


e The ultraviolet runs from 100 A to the atmospheric transmission cutoff, assumed 
to be 3000 A. If you can observe it from the ground, it’s not the ultraviolet. 
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The Near Ultraviolet (NUV) covers 2000-3000 A. 

The Far Ultraviolet (FUV) covers 912-2000 A. 

The Lyman Ultraviolet (LUV) is a subset of the FUV and covers 912-1216 A. 
The Extreme Ultraviolet (EUV) covers 100-912 A. 

Shortward of 100 A is the soft X-ray region of the electromagnetic spectrum. 


2. Technical Issues 


The issues particular to the ultraviolet are both technical and astronomical. This 
chapter will focus on the technical issues, and not the astronomical issues. The UV 
astronomical issues are dominated by the much stronger extinction from dust in the 
UV, the bright, diffuse emission lines at low earth orbit (airglow), and the molecular 
and atomic absorption/emission in the FUV from neutral and molecular hydrogen. 
These must be taken into account when considering any UV instrument design 
because, in general, they reduce the achievable signal-to-noise ratio in a planned 
observation, sometimes significantly. 

The technical distinctions can be categorized in five general areas: reflective sur- 
faces, detectors, dispersive elements, contamination, and diffraction/scatter. While 
these issues affect optical instruments as well, the performance differences in the 
ultraviolet can lead to different design requirements when considering an ultraviolet, 
as opposed to an optical, camera or spectrograph. 

Another aspect of the ultraviolet is that observations must take place above the 
atmosphere. Any altitude above 150 km is sufficient for the full range of the ultra- 
violet, but the NUV is accessible from the highest high-altitude-balloons, typically 
at altitudes of 120,000 feet or above. Operating from space implies all the require- 
ments of space-based astronomy: high reliability, ruggedness, thermal tolerance, low 
power, limited data bandwidth. Optical telescopes also operate from space to take 
advantage of the lack of atmosphere, so these requirements are not unique to the 
ultraviolet. 


3. Reflective Surfaces 


One of the most limiting factors in the design of ultraviolet instrumentation is 
the lower reflectivity of reflective surfaces in this bandpass. While pure aluminum 
provides excellent reflectivity across the ultraviolet, aluminum rapidly forms an 
oxide layer upon exposure to atmospheric conditions, and this oxide layer reduces 
the reflectivity of the surface across the UV and eliminates it shortward of 2000 A 
(see Fig. 1). The standard approach to this problem is to overcoat vacuum-deposited 
aluminum with a protective coating that is transmissive in the ultraviolet and imper- 
meable to oxygen to prevent oxidation of the reflective aluminum surface. The most 
common overcoat is MgF2. The primary and secondary mirrors of the Hubble Space 
Telescope, for example, are coated with Al/MgF2.! The reflectivity of an Al/MgF2 
coating is shown in Fig. 1. Note that the reflectivity for Al/MgF»2 begins to drop off 


UV Instrumentation 389 


100 


Reflectance (%) 
3 


A 23nmALDAIF3/Al 
407 | y 0 28nmALDAIF3/Al 1 
a; © 33nmALDAIF3/Al 

307 Wk gf © unprotected Al 7 

sol 11 Pa —= PVD MgF2/Al [ref. 9] 
wav | 
- O | 1e : 

10 Sx —— optical models | 

go (all dashed lines) 
Vor d 
0 7 


80 100 120 140 160 180 200 220 240 
Wavelength (nm) 


Fig. 1. Reflectivity for the traditional (vapor deposited) reflectivity of Al/MgF2, bare aluminum, 
and recent developments in using atomic layer deposition of AlF3/Al. From Ref. 2 (figure used 
with permission). See electronic edition for a color version of this figure. 


at ~1150 A, which is typically quoted as the reflectivity limit for Al/MgF2 optics. 
However, a limited level of reflectivity exists below 1150 A, presumably resulting 
from reflectivity from the front surface of the MgF 2. The practical applications of 
this short wavelength reflectivity can be seen in the short wavelength modes of 
the Cosmic Origins Spectrograph (COS) on the Hubble Space Telescope.? MgF» is 
a highly durable material, and has been used successfully on several space-based 
ultraviolet missions, most notably HST. 

Some missions (FUSE, Copernicus) have utilized LiF as the protective over- 
coat, which provides reasonable reflectivity down to ~1000 A (Fig. 2). LiF is highly 
hygroscopic, and requires careful handling procedures on the ground, including lim- 
iting the exposure to environments with relative humidity above 1%. Storage at 
vacuum or continual dry purge is essential for the preservation of the coatings. The 
LiF mirrors and gratings on the FUSE mission, for example, had an operations 
exposure limit (to room air) of 100 hours from coating until orbit. Degradation 
on orbit was minimal and presumably due to contamination and not hygroscopic 
deterioration.* 

For reflectivity below 1000 A, SiC optics (e.g. FUSE) can be utilized at nor- 
mal incidence, or grazing incidence systems can be employed. SiC carbide is highly 
durable, but only provides ~30% reflectivity in the 900-1100 A regime. Grazing inci- 
dence systems designed for FUV applications can tolerate significantly larger graze 
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Fig. 2. Reflectivity of AL/LiF coatings from the FUSE mission, as well as recent measurements 
and theoretical predictions for enhanced deposition techniques. From Ref. 5. See electronic edition 
for a color version of this figure. 


angles than X-ray systems, and therefore can effectively fill the available aperture 
without resorting to nested optics. The original design for the FUSE mission uti- 
lized such a system. Unfortunately, the fabrication specifications for an arcsecond 
quality grazing incidence telescope are extremely difficult to achieve, so that high 
resolution imaging or spectroscopy becomes expensive.* However, if a mission seeks 
the highest possible light collection capability while requiring modest image quality 
(often referred to as a “light bucket”) below the LiF cutoff, a grazing incidence 
system can be considered. 

Coatings such as MgF»2 and LiF may introduce complications if intended to be 
employed in UV/Vis systems that require high wavefront accuracy, such as coro- 
nagraphs. Even a completely uniform coating will introduce variable polarization 
across the aperture, due to varying reflection angles across the mirror. Since the 
light passes through the coating, variations in coating thickness across a mirror 
result in path length differences across the aperture, because the index of refraction 
of the overcoat is not unity. At the time of this writing, this a rapidly evolving 
issue being studied by multiple groups, both theoretically and in the laboratory, 
and there is currently optimism that a viable solution, allowing LUV performance 
and coronagraphy, can be found. 

The low reflectivity of even the best UV coating, compared to standard optical 
coatings.’ means that UV systems are typically designed with a minimum number 
of reflective surfaces. Since the number of aberration coefficients that can be min- 
imized is typically related to the number of free parameters in the system design, 


*Prohibitively expensive in the case of FUSE, which is why it was redesigned with normal incidence 
optics.® 
>The reflectivity of Al/MgF2 is ~80% at 1200 A, while silver is ~92% at 5000 A. 
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the inability to simply add additional corrector optics in a UV system means that 
UV designs often have larger aberration content than comparable optical designs. 
Additional degrees of freedom in the design can sometimes be introduced without 
the introduction of additional optical surfaces; for example, aberration-corrected 
holography or nonplanar detector surfaces. However, UV designs are typically faced 
with the challenging design optimization of trading off sensitivity against image 
quality or spectral resolution. An excellent example of two different choices in such 
a design is the Space Telescope Imaging Spectrograph (STIS)’ and the Cosmic Ori- 
gins Spectrograph (COS)® on the Hubble Space Telescope. STIS has more optical 
surfaces, producing better aberration control, higher spectral resolution, and lower 
throughput, while COS has a single optical element in its primary channels, provid- 
ing lower spectral resolution but significantly enhanced sensitivity. Depending on 
the specific science application, STIS or COS might be the better choice. 


4. Transmission 


There are a limited number of materials that will transmit ultraviolet light. A dis- 
cussion of special UV transmitting glasses can be found in technical notes from 
glass manufactures such as Corning? or Schott! and transmission of fused silica is 
discussed in Refs. 11 and 12. 

Note that all of these materials have short wavelength cutoffs, but not long wave- 
length cutoffs. At longer wavelengths in the NUV, filters with both long and short 
wavelength cutoffs are available. This is another design driver in the far ultraviolet. 
Establishing a bandpass filter typically requires the use of a crystal to define the 
short wavelength cutoff, and the use of a photocathode on the detector to establish 
a long wavelength cutoff. The most prominent example of such an approach is the 
Galaxy Evolution Explorer,'? which uses this technique to create two broad band 
filters (see Fig. 3). 

The efficiency curves for photocathodes typically used on UV detectors are 
shown in Fig. 4. The efficiencies of these photocathodes are driven by the ioniza- 
tion work function of the material, which is why they become ineffective at longer 
wavelengths and have no response in the visible. For this reason, detectors utilizing 
this class of photocathode are often labeled “solar-blind” (such as the Solar Blind 
Channel [SBC] on the Advanced Camera for Surveys '*). The combination of Figs. 3 
and 4 demonstrates one of the continuing issues in UV astronomy. Through the use 
of efficient filters and/or photocathodes, it is not possible to operate shortward of 
Ly-a while excluding Ly-a (1216 A). This is why no broad band imaging has ever 
been performed in the LUV. 

Thin films such as indium can be used to define bandpasses in the LUV and 
EUV as was done on EUVE.!® None of these filters can sustain a pressure differential 
of an atmosphere, and their throughput in the LUV is quite low. 
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Fig. 3. The FUV and NUV bandpases for GALEX. The short wavelength cutoffs are formed 
by the CaF2 imaging window in the case of the FUV (to block HI Lyman alpha), and the fused 
silica detector window in the case of the NUV. The long wavelength cutoffs are set by the choice 
of photocathode on each detector, CsI in the case of the FUV detector and CsTe for the NUV 
detector. From the GALEX Observer’ sGuide 
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Fig. 4. Photocathode efficiencies. Left: Quantum efficiency vs. wavelength for a 15,000 A-thick 
KBr opaque photocathode at a 15° graze angle to the channel axis using a 120-V mm! repelling 
field. Right: Quantum efficiency vs. wavelength for a 15,000 A-thick CsI opaque photocathode at 
a 15° graze angle to the channel axis using a 100-Vmm~! repelling field. Figures from Ref. 15. 
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5. Detectors 


The high absorption coefficient of SiO2 in the ultraviolet means that standard silicon 
devices such as CCDs are impractical below ~2000 A. The photon is absorbed in 
the oxide layer before it reaches the active pixel regime. Several groups are trying to 
make these devices efficient at shorter and shorter wavelengths. For most applica- 
tions, if a highly efficient CCD could be used at the desired wavelengths, it would be 
the detector of choice. However, the capabilities of current technology dictate that 
most FUV systems utilize alternate detector systems. The most commonly employed 
detector below 2000 A is the microchannel plate (MCP) detector (Fig. 5). The basic 
operation of the detector is outlined below. 

An incoming photon is absorbed by the photocathode and results in the pro- 
duction of a free electron. This photocathode might be applied on the surface 
of the MCP (referred to as an opaque photocathode) or it may be applied to 
the back side of a window in close proximity to the photocathode (referred to 
as a semi-transparent photocathode). Sealed tubes are typically employed when 
using photocathodes that cannot tolerate even instantaneous exposure to nonva- 
cuum environments, such as Cesium Telluride. With an opaque system, a photo- 
electron might be emitted within one of the MCP channels, or might be emit- 
ted from the top surface (referred to as the “web”.) An applied electric field 
directs the electron downward, and into the MCP material. The MCP material 
is an electron multiplier, meaning that the impact of the electron results in the 
emission of multiple electrons. Current research involves examining the proper- 
ties of alternate MCP materials that should produce lower internal background 
and allow larger format detectors due to their increased material strength.'® This 
process of electron multiplication is repeated as the electrons work their way 
downward through the MCP (or stack of MCPs.) When the electron leaves the 
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Fig. 5. Schematic of an open face MCP detector. The photon strikes the photocathode applied to 
the top MCP and an electron is emitted, which is driven into the MCP pores by a strong electric 
field. In a sealed tube detector, the bias grid is replaced with a sealed window, and the photocathode 
is applied to the inside surface of the window. From Ref. 17 (figure used with permission). 
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MCP structure, there are ~10°—10" electrons resulting from the initial photon 
impact. 

The number of electrons produced per successfully detected photon is the 
gain. This should not be confused with the detection quantum efficiency (DQE) 
of the detector, which the probability that any incident photon will be successfully 
detected. The DQE is often quoted interchangeably with the quantum efficiency 
(QE), but QE in the optical/UV is strictly defined as the number of electrons 
produced per incident photon, and in principle could be higher than unity as some 
photon impacts can produce more than one photoelectron. If photocathode quan- 
tum efficiencies are measured with a current sensing device, such as a diode, then 
the distinction between QE and DQE becomes important. DQE is the number that 
must be used in calculating instrument sensitivities, not QE. 

There are many different ways of measuring the position of the resulting 
charge cloud, including multi-anode (MAMA),!° time-delay,?9?! and cross strip 
readouts.?? 24 While some of these techniques involve pixelization at the scale of 
the physical readout (e.g. MAMA systems), most involve the digitization of an 
analog signal, e.g. a measurement of charge, a charge ratio, or a voltage that is 
related to the time delay between two propagating signals. This means that the 
pixelization of such devices is not directly tied to the physical structure of the 
readout scheme. In such cases, the resolution of the detector (the precision with 
which the detector can determine the physical location of the photon’s impact 
on the surface of the detector) can be very different than the pixel size of the 
readout. In principle, the analog-to-digital conversion of the readout system could 
be carried out to an arbitrarily large number of bits, and a device could have 
arbitrarily small pixels. However, most of these bits would carry no information. 
For example, while the (rectangular) pixels on the COS instrument are 6 microns 
in width, the resolution of the detector is 25 microns FWHM in that direction. 
This means that the line spread function of the detector is oversampled by the 
pixelization, and the resolution of the device is not simply twice the pixel size, a 
shortcut often used by astronomers. To further complicate matters, since the ADC 
is typically not perfectly linear, and the performance of the amplifiers measuring 
the analog signal may vary with temperature or other qualities (including inci- 
dent count rate and incoming pulse shape), the physical location (drift) and size 
of a pixel (stretch) may vary with time and/or position or even with the individ- 
ual gain of each detected photon (often referred to as “walk”). This separation of 
pixelization from resolution and pixel variability can be a source of confusion for 
astronomers more familiar with physically pixelated devices such as CCDs or IR 
arrays, but cannot be ignored if the highest quality data is to be extracted from 
MCP data. 

However, microchannel plate detectors do have some advantages, most promi- 
nently that they can be operated in photon-counting, time-tagging mode, recording 
the location and arrival time of each detected photon rather than building up an 
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image as is done with a CCD. In such a mode, the read noise is zero (there remain 
dark counts) and the data can be arbitrarily time binned during analysis. Pointing 
drift or instrumental variations can be corrected in analysis, and times of bad data 
can be easily excluded. In addition, MCPs can be employed with curved surfaces, 
allowing a match to the locus of best focus. 


6. Dispersion 


While low dispersion prisms and grisms can be designed for UV applications using 
the transmitting materials discussed in Sec. 4 (e.g. the grisms on GALEX and Swift), 
most spectroscopic applications in the UV utilize reflection gratings. The shorter 
wavelengths of the UV necessitate that these gratings have proportionally higher line 
densities than optical gratings to achieve the same A/d, where d is the grating line 
spacing. For example, the line densities on FUSE SiC and COS G130M were 5767 
and 38001/mm, respectively. In addition, the shorter wavelengths of the ultraviolet 
means that UV gratings will typically exhibit a higher level of scattered light than 
their optical counterparts. For these reasons, holographic gratings are preferred in 
the ultraviolet when practical. Holographic gratings exhibit extremely low scatter 
in the UV, as low as 10~7/A at 10A away from line center. The low reflectivity of 
UV mirrors also drives UV designs towards the minimum number of reflections, so 
that aberration correction with the dispersing element (either through aberration 
correcting holography or utilizing an aspherical optical figure, such as a torus) is 
common in UV designs. The classic technique of collimation/dispersion/reimaging 
as employed in most visible wavelength systems has too many reflections to allow 
a high efficiency UV spectrograph. 


7. Contamination 


Contamination is a primary concern in the ultraviolet, because even small amounts 
of contamination can result in significant efficiency losses in an ultraviolet instru- 
ment. In general, the problem becomes more severe at shorter wavelengths. The 
primary contaminants of concern are nonvolatile hydrocarbons on the reflective 
surfaces. Layers as thin as 5 A can meaningfully reduce the reflectivity of a surface 
in the FUV. To achieve this level of cleanliness, all handling, storage and test pro- 
cedures must consider their contamination impact, and essentially, all operations 
involving exposed optical surfaces must occur in a clean room. 

Clean room specifications (class 10,000, class 1000, etc.) are a measure of par- 
ticulates in a clean room, not hydrocarbon levels. Therefore, the standard quoted 
cleanliness level of a particular environment does not necessarily translate into con- 
tamination risk for ultraviolet systems. Constant monitoring of witness samples that 
travel with the flight optics in all phases is required to ensure that procedures are 
adequate to meet the specified levels of cleanliness. 
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8. Diffraction and Scattering 


The shorter wavelengths of the ultraviolet provide one advantage in system design: 
diffraction is rarely a concern. A 2.4-meter diameter telescope operating at 1000 
A has a 1.22/D of 4 x 10-8, or 0.01 arcseconds. This will not typically be the 
limiting factor in the instrument performance. However, the shorter wavelengths 
exhibit more scatter, mandating smoother optical surfaces. The fractional total 
integrated scatter (TIS) from a reflective surface is normally quantified as: 


TIS = 1 — exp(—470/A)?, (1) 


where o is the rms surface roughness. As long as \ >> 47, the scatter is small. In 
the optical, with 4 ~ 5000 A, a mirror with o = 20 A surface roughness will scatter 
about 0.2% of the incoming light. At 1000 A, however, the same surface will scatter 
~6% of the incoming light. This drives UV systems to have lower rms roughness 
than is normally required in optical systems. 


9. Design Considerations 


When designing an ultraviolet system, these technical issues all factor into the design 
optimization. For imaging systems, the angular resolution achieved is normally lim- 
ited by the detector spatial resolution. (Note that the detector spatial resolution is 
not the same as the pixel size.) Therefore, the angular resolution is typically the 
detector spatial resolution divided by the focal length. However, long focal lengths 
result in smaller fields of view. Many of the current development efforts in MCP 
detectors are aimed at providing better resolution and larger formats to address 
this issue. The most recent UV imaging survey instrument (GALEX) had a field 
of view ~1 degree in diameter and an angular resolution of ~5 arcseconds. Photon 
counting MCP detectors have global count rate limits, which constrain the dynamic 
range and the observable fields if accurate photometry is desired. 

The most limiting aspect in UV imaging is the lack of filters that operate 
similarly to optical filters. There are an extremely limited number of acceptable 
materials to choose from, and none of them has a long wavelength cutoff in the 
FUV. Additionally, no filter exists that excludes HI Lyman a airglow while passing 
the LUV. (Note that the geocoronal Ly-a intensity at night is 3500 Rayleighs, 
or ~6800 photons/s/cm? per square degree.) Any reasonable survey instrument 
will have many square centimeters, so that Ly-a rejection is a fundamental issue 
that requires extreme rejection capabilities. These factors are the primary rea- 
son that UV imaging has been limited to very broadbands in the FUV (exclud- 
ing Ly-a), and broadbands (with narrower color-difference bands) in the NUV 
(e.g. Ref. 25). LUV imaging requires either dispersive rejection of Ly-a?° or Ly-a 
absorption cells.?7 

Spectrographic instruments are not constrained by filter materials, and have 
operated across the UV spectrum from EUV through the NUV. The limiting factor 
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in these designs is typically the difficulty in controlling aberrations for better spec- 
tral resolution and cross-dispersion imaging while maintaining a high throughput. 
Recent systems have used a minimal number of optics (COS has three optics in 
its primary spectrographic channel [HST telescope + 1 grating] while FUSE has 
only two [prime focus telescope + grating]). The limited design space allowed in 
these two designs meant that both allowed astigmatism to remain in the system 
(as it does not affect spectral resolution), which limited them to spectroscopy of 
well-separated point sources. Unless a higher throughput coating is developed, this 
fundamental design trade will continue to dominate UV spectrographic designs. 

The essential design goal is to increase the number of free parameters available 
to correct aberrations, while maintaining a limited number of reflective surfaces. 
This can be achieved through more complex holographic systems (utilizing aber- 
rated wave front beams in the holography or introducing more complex figures 
(nonconic) onto the limited optics employed. 


10. Conclusions 


Optical telescope systems operate at near optimal conditions. The telescopes operate 
at or near the diffraction limit, and for ground-based systems, the limiting factor is 
most commonly the seeing. Detector quantum efficiencies are near unity, and reflec- 
tivities are high. A wide range of transmitting materials is available for the optical 
designer. Much of this maturity is due the broad applicability of optical technology 
for civilian and military applications, allowing astronomical systems to leverage off 
of the technological investments made outside of astronomy. The primary means of 
improving the performance of future systems is to: 


(1) Make the telescope larger (for example, ELT, GNMT or TMT) to increase the 
collecting area and decrease the diffraction limit. 

(2) Correct for the limited seeing of the atmosphere. This currently works over 
limited fields of view, but substantial efforts to make the technique applicable 
over larger fields of view and a wider range of wavelengths are a continuing 
effort. 

(3) Increase the number of resolution elements available for simultaneous observa- 
tion. Currently this is accomplished by adding more and more detectors to the 
focal plane of the system, or building multi-object spectrographs. 


This is the hallmark of a mature field: that improvements are based primarily on 
increasing the size or number of existing technologies. Since reflectivities or detection 
efficiencies greater than unity are impossible, only a true revolution in detector 
technology (for example, measuring the photon energy with precision comparable to 
high resolution spectroscopy while simultaneously determining the sky coordinates 
of the photon) will allow breakthrough science without requiring increases in size 
or number. 
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Ultraviolet systems, by comparison, are much less mature, and significant 
advances in instrument performance are possible without larger apertures. DQE’s 
in the NUV remain at 10-15%, and in the FUV peak at ~50%. Reflectivities in 
the LUV are as low as 30% (SiC) and no better than 80%. If reflective coatings 
with 95% reflectivity could be identified, the allowable UV optical design space 
would be increased enormously. This immaturity of UV technology reflects its lack 
of utility to nonastronomical fields. It has no commercial application outside of 
vacuum chambers, and is rarely the choice for space-based military detection sys- 
tems. Therefore, the investment in ultraviolet technology comes almost entirely from 
the astronomical community. However, this does mean that significantly increased 
observational capabilities over the state of the art are possible through application 
of current technology or potential improvements in the near future. The scope of 
possible improvements over the state of the art are especially large in the arena 
of UV imaging, where GALEX, at 5 arcsecond imaging, is the state of the art 
in the FUV and NUV, while no imaging exists at all in the LUV. Spectroscop- 
ically, a UV spectrograph with resolving power of 100,000 and the sensitivity to 
observe V = 29 QSO’s efficiently would revolutionize our studies of the intergalac- 
tic medium (IGM). The sky background at low earth orbit is minimized in the 
NUV, but at L2 or beyond, (beyond the geocoronal emission) the sky background 
in the FUV may well provide the highest contrast (S/N) waveband for observ- 
ing the faintest diffuse objects in the universe, including the elusive emission from 
the IGM. 
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X-ray Telescopes Based 
on Wolter-I Optics 


Giovanni Pareschi*, Daniele Spiga and Carlo Pelliciari 
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Via E. Bianchi 46, Merate (Lc), Italy 


* giovanni. pareschi@brera. inaf. it 


In this chapter, the main configurations of X-ray astronomical telescopes for space 
astronomy are introduced. To this end some historical notes on the development 
of X-ray optics are given and some perspectives on future applications are dis- 
cussed. X-ray telescopes were introduced at the beginning of the history of X-ray 
astronomy in order to improve the imaging and flux sensitivity achievable with 
direct view detectors. Since refractive lenses cannot be used with X-rays (due to 
the very unfavorable optical constants in this energy band), X-ray telescopes are 
generally based on grazing-incidence mirrors, working at glancing angles (typ- 
ically of the order of 1 degree). The classical geometries utilized for making 
X-ray astronomical mirrors are derived from H. Wolter’s work, initially devel- 
oped for use in X-ray microscopy applications. These two-reflection mirrors are 
pseudo-cylindrical shells whose profiles follow conical curves. Other configurations 
that evolved from Wolter’s design have also become interesting for astronomical 
applications. Kirkpatrick—Baez and lobster eye optics represent other, completely 
different, geometries that can be used for X-ray telescopes, but Wolter’s optics 
configurations remain the most widely used system adopted for X-ray telescopes. 


1. Introduction 


X-rays were discovered by R6éntgen! in 1895, who immediately understood that 
“"..it is obvious that the X-rays can’t be concentrated by lenses; neither a large 
lens of hard rubber or metal lens having any influence on them...”. This aspect 
is due to the very unfavorable and completely untypical optical constants in the 
X-ray band, in which the real part of the refraction index is slightly lower than 1. 
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The different events related to the development of X-ray optics for astronomy are 
reported in Table 1. In 1922, the “young student” Enrico Fermi, in his Master’s 
Degree Thesis at the University of Pisa, experimentally obtained the first X-ray 
converging beams, using focusing techniques based on grazing-incidence reflection 
and Bragg diffraction from mica crystals that were opportunely bent in order to 
follow an elliptical profile (Fig. 1, see Refs. 2 and 3). In doing that, Fermi followed 
the method theoretically suggested by Gouy* a few years before (this configuration 
was, independently, tested on the same year by Dardord’ and subsequently widely 
exploited for making X-ray focusing spectrometers, following the studies by von 


Hamos and collaborators; see Ref. 6). 


Table 1. Main events related to the development of X-ray grazing-incidence optics. 
Year Event Note 
1632 _B. Cavalieri proposes the design of a parabolic B. Cavalieri suggested that a similar 
solar concentrator working at grazing kind of mirror was used in ancient 
incidence Rome by Vestal to light the holy fire 
1895 W.C. Rontgen discovers “X-rays” R6éntgen immediately understood the 
difficulty of making optics for 
X-rays because of the very weak 
interaction with the matter 
1922 KE. Fermi performs the first successful Optics based on pseudo-cylindrical 
experiment of X-ray reflection and mica optics in the von Hamos 
concentration, developing optics based on configuration 
crystal diffraction 
1922 <A. Compton proves the total reflection at 
grazing incidence for X-rays 
1948  P. Kirkpatrick and A. Baez successfully use Kirkpatrick and Baez optics still very 
two-reflection grazing-incidence optics for much used in X-ray microscopy 
obtaining an X-ray image applications 
1949 H. Friedman’s team observes the Sun in X-rays _‘ Rocket flight, direct view detectors 
1952. H. Wolter proposes the use of two-reflection In the next decades, the Wolter-I 
grazing-incidence optics based on conic optics (paraboloid + hyperboloid) 
sections for X-ray microscopy applications will be the most used configuration 
in X-ray astronomy 
1960 R. Giacconi and B. Rossi propose the use of The single reflection parabolic 
grazing-incidence concentrators based on configuration does not allow true 
parabolic mirrors for improving the flux imaging because it is strongly 
sensitivity in X-ray astronomy affected by coma aberration 
1962 R. Giacconi et al. discover Sco X-1, the first Milestone in X-ray astronomy 
extra-solar X-ray astronomical source, using 
collimated detectors 
1963 Giacconi and Rossi fly the first (small) Wolter-I 
optics to take images of Sun in X-rays 
1965 Second flight of a Wolter-I focusing optics Development of X-ray optics for X-ray 
(Giacconi + Lindslay) solar observations 
1973 Skylab carries on-board two small X-ray optics 


for the study of the Sun 


(Continued) 


Year 


1970 


1978 


1983 


1990 


1993 


1996 


1999 


2004 


2005 


2012 
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Table 1. 
Event 


Launch of Uhuru, the first satellite entirely 
devoted to X-Ray astronomy 

Einstein (a.k.a. HEAO-2), the first satellite with 
X-ray optics entirely dedicated to X-rays 


EXOSAT (first European mission with X-ray 
optics aboard) 


ROSAT, first All Sky Survey in X-rays by 
means of a focusing telescope with high 
imaging capabilities 

ASCA, a multimodular focusing telescope with 
enhanced effective area for spectroscopic 
purposes 


Beppo-SAX, a broadband satellite with Ni 
electroformed optics 


Launch of Chandra, the X-ray telescope with 
best angular resolution (0.5 arcsec), and 
XMM-Newton, the X-ray telescope with 
argest effective area (4500 cm?) 

Launch of the Swift satellite, devoted to 
investigating GRBs (with the X-ray telescope 
based on electroformed optics) 

Launch of Suzaku, with high throughput optics 
for enhanced spectroscopy studies with 
bolometers 

Launch of NuSTAR, based on optics able to 
focus X-rays in the hard X-ray domain (up to 
80 keV) 


(Continued) 


Note 


No focusing optics! 


Enormous improvement in terms of 


flux sensitivity and imaging 
capabilities, thanks to the focused 
telescope. Optics based on 
grazing-incidence directly-polished 
mirror shells made of thick glass 


X-ray optics made of Beryllium 


produced via epoxy replication, 
starting from a superpolished 
mandrel 

Optics based on grazing-incidence 
directly-polished mirror shells made 
of thick glass-ceramic 

Lightweight segmented conical optics 
based on thin Al foils. The high 
throughput compensates for the 
modest imaging capabilities 

Final set-up of the Ni electroforming 
method is based on the replication 
starting from superpolished 
mandrels, an approach initially 
explored by Giacconi et al. The 
method will be used for several 
missions 

Two of the most important scientific 
missions of NASA and ESA 


Use of multilayer coatings applied on 
segmented thermally-formed thin 
glass substrates 


In 1922, Arthur H. Compton experimented with the reflection of X-rays from 


polished surfaces at “grazing” or “glancing” angles of one degree or less.”° Two 
decades later, in 1948 Kirkpatrick and Baez successfully began to experimentally 
explore the possibility of using Compton’s glancing angle reflection technique to 
focus X-rays for X-ray microscopy applications. They then obtained the first image 
with grazing-incidence mirrors.®!° The optic was a double-reflection system based 
on pairs of curved mirrors with approximately paraboloidal shape in sequence. 
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X-ray source 


Mirror 


(a) Picture (b) Setup 


Fig. 1. Scheme of the experiment performed by E. Fermi, reported in his Master’s Degree Thesis. 
The X-ray shadow picture of crossed metal wires was taken using a Bragg diffraction grazing- 
incidence optics made of curved mica crystals. For the first time, X-ray beams were concentrated, 
with a resulting gain in intensity and contrast. 


FIGVRA XXTI. 


Fig. 2. (a) The first concept of parabolic X-ray grazing-incidence mirror proposed by Giacconi 
and Rossi in 1960. (b) The grazing-incidence parabolic mirror described by Bonaventura Cavalieri 
in his book of 1632 (picture taken from the original copy of the Brera Astronomical Observatory- 
INAF library, credits: A. Mandrino, M. Carpino). 


About 10 years later (1960), Giacconi and Rossi, inspired by the Kirkpatrick 
and Baez work, proposed the first concept of focusing optics for X-ray astronomical 
applications! (interestingly, this paper was published two years before the 1962 
discovery, by means of collimated X-ray detectors operating in Geiger mode, of 
the first extra-solar X-ray source Sco X-1,!? which opened the new frontier of X- 
ray astronomy). The optics design proposed by Giacconi and Rossi (Fig. 2‘a)) was 
very simple and not really able to deliver actual images; it was just providing a 
concentration of the reflected photons in order to increase the signal-to-noise ratio 
for low flux sources. Indeed, the optics configuration was based on a parabolic geom- 
etry that is, for grazing-incidence reflection, strongly affected by coma aberration. 
As Giacconi afterward observed,!* this first idea of X-ray astronomical optics was 
somewhat similar to the grazing-incidence configuration mirror described by the 
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Italian scientist Bonaventura Cavalieri in the 17th century on The Burning Mirror, 
or a Treatise on Conic Sections'* (Fig. 2(b)). 

In any case, this first proposal by Giacconi and Rossi was extremely impor- 
tant, since the advantages of focusing to improve the angular resolution and flux 
sensitivity for future astronomical observations in X-rays were clearly recognized. 
Moreover, Giacconi and Rossi in this paper also suggested the idea of nesting several 
confocal mirrors in order to increase the collecting area of X-ray telescopes, and to 
consider the use of multiple reflection from mirrors with conical curve profiles to 
construct “image-forming X-ray telescopes”, as had been previously suggested by 
Hans Wolter for X-ray microscopy applications.!° Thanks to this suggestion, in the 
following decades nested mirror shells in Wolter-like configurations were extensively 
used for the implementation of X-ray telescopes. 

The imaging capabilities of X-ray mirrors then advanced very fast during the 
first 50 years of X-ray astronomy. In the 1960s and 1970s, an intensive research and 
development program was carried out by Giacconi and collaborators, aiming at the 
realization of imaging X-ray astronomical optics.!? In particular, in 1965 the first 
image of our Sun in X-rays was obtained by means of a payload based on X-ray 
optics that was launched aboard a rocket.!® Afterwards the structure and details in 
X-rays of the Sun’s coronae were observed with focusing telescopes with an angular 
resolution of a few arcseconds in several rocket flights. This work paved the way for 
a more ambitious experiment on the Skylab space station. The first telescope based 
on two nested shells was indeed implemented, able to obtain a veritable motion 
picture of the formation, evolution and dynamics of the plasma features on the 
Sun over several solar rotations.!” The effort that was carried out for the solar 
telescopes in the first 20 years of X-ray astronomy was very useful also in order 
to successfully develop the design and the technologies needed for large size high 
precision grazing-incidence telescopes devoted to extra-solar X-ray astronomy. 

In this way, it was possible to implement the Einstein X-ray Observatory,'® 19 
another milestone for the exploration of the sky in the X-ray band that allowed us 
to investigate the sky with a few arcseconds angular resolution. Moreover, thanks 
to a remarkable effective area of about 400cm? at 1keV, a flux sensitivity 104 
times better than the first experiment that led to the discovery of Sco X-1 was 
achieved. The Einstein mission was followed by other imaging telescopes with very 
enhanced angular resolution, like ROSAT?°?' (1990), which performed the first 
focussed all sky survey in X-rays, and Chandra??:?3 (1999), with a further gain in 
angular resolution of a factor 10* with respect to the first experiment. Indeed, the 
angular resolution of Chandra’s optics is a fraction of an arcsecond. We can therefore 
say that in just four decades X-ray astronomy evolved from the sub-“Naked Eye” 
resolution of the first experiment that led to the discovery of Sco X-1 to almost 
the level of the ‘Hubble’ angular resolution (for comparison, starting from the first 
Galileo observations of 1609,74 almost 400 years were needed to achieve similar 
results for telescopes working in the visible band; see Fig. 3). 
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Fig. 3. Angular resolution improvements across the time for optical and X-ray telescopes. On 
the abscissa, the year of the telescope implementation is reported, while on the ordinate axis 
the achieved angular resolution in arcseconds is shown. For the visible band, the initial value 
corresponds to the angular resolution of the naked eye. 


At the end of the past century other missions were implemented based on 
X-ray optics of Wolter’s type, with enhanced throughput and more devoted to 
spectroscopy applications than imaging, like EXOSAT,2° ASCA?® and Beppo- 
SAX.?" In the new millennium, other high-throughput telescopes were launched, like 
ESA’s high-throughput XMM-Newton Observatory-Class mission,?*?9 Suzaku,?° 
the X-ray telescope aboard Swift?!:3? (which finds precise positions for the X-ray 
afterglows of gamma-ray bursts), and NuSTAR.*? The ASTRO-H** mission (a.k.a. 
Hitomi) was developed and successfully launched in 2016; however, due to an unfor- 
tunate accident, after having obtained very effective images at high spectral reso- 
lution with its calorimetric camera, it was lost just after a few days of operations. 
A reflight of some of the Hitomi instruments is currently being planned. 

The German eROSITA®® telescope onboard the Russian Spectrum X-Gamma 
satellite (devoted to perform a new X-ray sky survey) was successfully launched 
in July 2019. It should be noted that with NuSTAR®® and Astro-H,°" multilayer- 
coated mirrors* working at very small angle of incidence have been used, extending 
in that way the use of focusing techniques into the so-called hard X-ray band (10-80 
keV). This achievement was possible following many years of technology develop- 
ments.°> 4° Last but not least, it should be noted that a very small satellite called 
pROSI,*! based on very small size focusing X-ray optics, has been realized for the 


“See Chapter 7 “X-ray Multilayer Coatings” of this volume. 
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first time with the support of high-school students; it will perform observations 
specifically devoted to amateur astronomy for outreach purposes. 

For more than 50 years, X-ray telescopes for solar? and extra-solar astronomy 
have been successfully implemented, with continuous improvement of the perfor- 
mance in terms of both angular resolution and throughput. The technologies for 
the mirror production need new developments and advancements in order to make 
possible the challenging optics of next large X-ray missions. In particular, the Athena 
mission recently approved by the European Space Agency (ESA) as the L2 mission 
of the Cosmic Vision program, with launch foreseen in 2028, will have an X-ray 
telescope with a diameter of nearly 3m, with an effective area of 1.4m? and an 
angular resolution of 5 arcseconds half-energy width (HEW)? across a field of view 
of 40 arcmin in diameter.*? The Lynz (a.k.a. X-ray Surveyor) mission,“* proposed 
for a launch after 2030 and under study by NASA, will be characterized by an 
effective area similar to Athena but with a much more challenging requirement 
for the angular resolution, being just 0.5 arcsec HEW (i.e. like Chandra, but with 
throughput > 30 times larger). 

In this chapter of the Handbook, we will review some basic aspects related 
to the design of grazing-incidence X-ray astronomical optics, with particular refer- 
ence to the Wolter-I configuration. The present work has been inspired by other 
excellent reviews* >? on Reflective X-ray Optics and Telescopes that should be 
also consulted for additional information. In the next chapters of this Handbook, 
different techniques for the implementation of X-ray telescopes will be discussed. 


2. Grazing-Incidence Reflection: An Optical Constants Quest 


Since the discovery of X-rays, the difficulty of refracting or reflecting X-rays was 
apparent. The explanation is that the refractive index of all materials in X-rays 
is very close to 1 and is, in fact, slightly less than 1. The situation is therefore 
completely different from the usual optics in visible light, where the refractive index 
is larger than 1. In fact, the X-ray energies are above the characteristic energies of 
the valence electrons in the atoms. The materials appear to X-rays like an almost- 
free electron gas, with a characteristic plasma frequency (in general lower than the 
frequency of the incident radiation;°4 see Ref. 55 for a detailed treatment). The 
X-ray refraction index may then be written in the form: 


n=1-6+if, (1) 


where the real part (with 6 ~ 10~*-10~°) accounts for the refraction effect and the 
imaginary part (with 3 ~ 10~°-10~-°) is related to the X-ray beam photoelectric 
absorption. The 6 and ( parameters therefore represent the optical constants of the 
material. The X-ray refractive index depends on the atomic properties through the 


bHalf-energy width is the diameter within which half of the X-ray energy in the image is focused. 
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equation: 


n= 1-H Po (fy + ifs) (2) 


with N, being the Avogadro number, r- the classical electron radius, A the atomic 

weight and f; (the first atomic scattering coefficient, corresponding to the number 

of scattering electrons per atom). At very high energies (i.e. much larger than the 

K-shell binding energy), f; is almost equal to the atomic number Z. The second 

scattering coefficient f2 is included to take into account the photo-absorption, and 

it is related to the atomic photoelectric cross-section opp, by the following formula: 
Oph 


h= 2rer ) 


The most prominent variations of the optical constants are close to the K, L, and 
M atomic levels (or shells), in correspondence of which the f2 coefficient presents 
typical edges and the material can show anomalous dispersion effects. The main 
contribution to @ is then given by the photoelectric effect. It becomes especially 
important in correspondence to the electronic energy levels, in particular for the 
most tightly bound electrons (K-shell), whose energies are given by the well-known 
Moseley law. Over the energy of the K edge, Ex, the photoelectric cross-section 
decays approximately as E~(8/3) and rapidly increases with Z. Therefore, low-Z 
materials (like e.g. Carbon or Silicon) have lower K-edge energies, and for a fixed 
energy they are less sensitive to the photoelectric effect. 

The extremely small deviation to the direction of the incidence X-rays caused 
by the real part of the refraction index would imply that refractive lens optics for 
X-rays would have a focal length too long to be implemented on a single spacecraft 
(>several tens of meters). On the other hand, the use of thick lenses for X-rays 
would be ruled out by the too-large absorption coefficient (actually, similar refractive 
systems have been proposed for gamma-ray astronomy in the MeV spectral region, 
with optics and focal planes hosted in two different satellites placed at a huge 
separation, see Refs. 56 and 57). As a consequence, practical X-ray optics have to 
be just reflective and in a grazing-incidence configuration. Indeed, because of the 
extreme smallness of 6, the reflectivity from a mirror surface is always small. This 
is evident from the expression of the Fresnel equations for the two polarizations, 
written in the usual form valid for the ordinary reflection and refraction: 

Ny sin 0, — N92 sin A 
as Ny sin A; + ng sin Ag , ea) 
Ny sin Ag — ng sin A; 


rp = (4b) 


MY sin A + ng sin A; : 


where n; and ng denote the refraction indices of the two media, and 6; and 42 
the incidence and the refraction angles, measured from the surface (Fig. 4). The 
subscript “s” denotes the polarization orthogonal to the incidence plane and “p” the 
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ny 0, Trt = 84 


Fig. 4. Reflection and transmission geometry for a smooth surface between optical media with 
refraction indices ny and n2 (for X-rays, if the first medium is vacuum, nj = 1, while ng < 1). 


one lying in the plane of incidence. In the limit of small 6; and 62, both reflectivities 
approach the asymptotic behavior 1/sin?6,, denoting an abrupt decrease of the 
reflectivity for increasing incidence angle. 

On the other hand, as the angles tend to zero the reflectivity cannot diverge 
to infinity. In fact, at sufficiently small angles, total external reflection occurs and 
Eqs. (4a) and (4b) are no longer valid. In practice, the real part of n is smaller than 
1, and Snell’s law cannot be satisfied for incidence angles smaller than the critical 
angle 6, for total reflection (i.e. when cos 6. = 1 — 6). Owing to the small value of 
6.,cos 6, % 1 — 62/2, and one derives easily an approximate expression for 6.: 


0, & 26. (5) 


For 6; < @., the incident ray is totally reflected, excluding a fraction absorbed via 
photoelectric effect. The reflection angle is always very shallow: for soft X-rays, in 
the 0.1-10 keV region, is in general < 1 degree, as A.H. Compton experimentally 
discovered in 1923.”° For X-ray optics, the total reflection is the most widespread 
approach to grazing-incidence optics, and large effective areas and sensitivities can 
be reached. However, the critical angle increases for larger densities p of the reflect- 
ing layer but decreases in proportion to the X-ray energy, EF: 


JP 

0. or (6) 
This implies in turn that, at a fixed incidence angle, only X-ray energies below a 
cut-off value FE, can be totally reflected, and this is the main reason why single-layer 
X-ray optics are difficult to use in the hard X-ray band (i.e. for energies >10 keV). 
Indeed, over 10 keV even the critical angles of the densest coatings become too small 
and the mirror cross-section offered to the incident flux becomes too low to return 
a sufficient effective area, unless using very long focal lengths (tens of meters). 
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2.1. Thin Film Coatings for X-ray Reflection at Grazing Incidence 


As an example of a reflective coating, the X-ray reflectivity of Platinum as a function 
of energy for different grazing incidence-angles and assuming an ideally smooth 
surface is shown in Fig. 5. The reflectivity is close to 1 (total reflection regime), 
either for very small incidence angles or for very low X-ray energies. In the total 
reflection regime, the reflection takes place in a shallow depth, and the photoelectric 
absorption is usually limited. 

When increasing the incidence angle, however, the required coating thickness 
increases also: 


r 1 
d, (0) ® — —— 7 
Pp ( ) on Vez — 0 ( ) 
and the beam is absorbed to a larger extent. If the attenuation is low, the penetration 
of the beam decays exponentially in the reflective coating. The reflectivity decreases 
slowly up to the critical angle, where the penetration becomes infinite (and the 
refracted ray appears). Beyond the critical angle the reflectivity decays as sin * 0. 

In principle, materials with low-Z elements (such as Carbon) would be excellent 
X-ray reflectors, as their absorption is very low. Unfortunately, at grazing-incidence 
angles large enough to return a significant mirror effective area, only the softest 
X-rays would be reflected (Fig. 6). 

High-Z materials (Au, Pt, Ir,...) exhibit much larger cut-off angles and they 
keep a high reflectivity up to 10 keV at viable grazing-incidence angles (~500-1000 
arcsec), but are more prone to photo-absorption in that energy range. Finally, the 
transition from the total reflection regime to the ordinary, low reflectivity regime is 


Platinum 


Reflectivity 


0 20 40 60 80 
Photon Energy (keV) 


Fig. 5. X-ray reflectivity curves in Platinum as function of energy for fixed incidence angles 
(decreasing left-to-right). The reflectivity is very good in grazing incidence up to the critical angle 
(function of the photon energy), where the reflectivity suddenly drops. The cut-off grazing angle 
decreases with increasing incidence angle (and vice versa). See electronic edition for a color version 
of this figure. 
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Fig. 6. Calculated reflectivity curves for a single platinum layer and a single carbon layer. 
Although the platinum reflectivity is more extended in energy than carbon, the photoelectric 
effect is more intense at energies just beyond the M absorption lines (2-3 keV). In contrast, the 
photoabsorption of carbon is much lower in the 1-15 keV band, but the reflection cutoff is located 
at much lower energies. However, a capping layer of carbon over the platinum moderates the 
photoelectric effect, enhancing the reflectivity of the platinum. 


more gradual. Overcoatings based on low-Z materials like C or B4C can be applied 
to a high-Z material to enhance the soft X-ray reflectivity while still maintaining a 
large cut-off energy (see Fig. 6 and Refs. 58-60); this solution is being considered, 
e.g. for the implementation of the Athena mission. 


2.2. Reflectivity Reduction Due to Diffuse Scattering from 
Micro-roughness 


The high reflectivity required for X-ray astronomical optics, in particular for two 
reflection systems as required for imaging, can be seriously hampered by the micro- 
roughness of the reflecting surface. The mirror surface has to be very smooth (with 
roughness below a few angstroms) in order to return an X-ray reflectivity near to 
the value predicted by the Fresnel laws. 

A statistically rough surface can be described by an appropriate function z(2, y), 
which returns the height of the surface at each point (x, y). An ideal surface parallel 
to the (x,y) plane would simply have z(x,y) = zo, where zo is a constant, but real 
surfaces are never ideally smooth. In the following we assume that the surface is 
isotropic, so that its properties are the same along any direction. The rms of the 
microscopic profile along a given direction, which we assume to be the z-axis (see 
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AOB(x) = 22z(x) sin 8; 


Fig. 7. Reflection from surface affected by micro-roughness. The reflectivity reduction is caused 
by the loss of spatial coherence in the incident wavefront. 


Fig. 7), is given by: 


EL 
Gs if [z(x) — 20]? dx (8) 


where L is the length of the profile and zo = (z(«)) (for sake of simplicity, we assume 
zo = 0). A ray with wavelength is incident on the surface at a grazing-incidence 
angle 6; from an ambient with refractive index n and it is reflected (in the incidence 
plane) at the angle 6,. = 0;. We suppose that the smooth-surface condition 270 sin 
6; < A is fulfilled, as usually occurs with polished surfaces. If the surface were 
perfectly smooth, two adjacent parts of the wavefront would be reflected and would 
arrive to a distant observer with the same phase shift they had before reflection. 

In contrast, owing to the height distribution, the wavefront reflected at the 
height z(x) by an element of surface dx has a phase shift (see Fig. 7): 


Ag= *T92(n)m sin 6; (9) 


The reflected electric field is the superposition of the contributions of all of the 
elements of the profile with amplitude rEp (where Eo is the incident electric field 
amplitude at the surface), each with its own phase shift, weighted by the likelihood 
p(«)dx = dx/L of striking upon the surface element dz; 


Es . 
E, = rE | exp (-=: (x) nq sin a) p(a)da. (10) 
0 


This derivation was possible due to the smooth-surface approximation, which guar- 
antees the reflection of the beam in the specular direction. We now suppose that 
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the distribution p(z) of heights is a Gaussian: 


(2) = a exp (-=) | (11) 


where @ is the rms of the surface. Equation (10) becomes: 


EK a Art 2 
E, = = = e exp (-= zn, sin 8; — =) dz. (12) 


Completing the square in the exponent and solving the integral in Eq. (12), the 
reflectivity of the rough surface R, = |E,-/Eo|? 


4 2 
R, =r? exp - (Fon sin a) | F (13) 


This basic formula (known as Debye-Waller formula) shows that: 


can be written as 


e the reflectivity decreases with the exponential of the square of the rms roughness 
(see Fig. 8); 

e the reflection at larger angles is more sensitive to the roughness effect, because 
the phase dispersion in the reflected wavefront depends solely on the projected 
height in the direction of incidence. 


— a(rms) =15.6A 
— o(rms)=2.6A 
— a(rms)=1.3A 


Reflectivity @ 8.05 KeV 


800 1200 1600 2000 2400 2800 
Grazing Incidence Angle (arcsec) 


Fig. 8. Reflectivity scans of an electrochemical nickel sample, at different polishing steps (where 
the top curve represents the smoothest surface). The reflectivity is improved as the sample is 
superpolished to higher accuracy (decreasing values of o), in agreement with the Debye-Waller 
formula. See electronic edition for a color version of this figure. 
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3. X-ray Telescope Configurations: Wolter’s Optics Family 
and Relatives 


For astronomical applications, there are three fundamentally different configurations 
for X-ray optics working at grazing incidence which are called (i) the Wolter-like 
systems, (ii) the Kirkpatrick—-Baez (KB) type systems and (iii) the focusing col- 
limator or “lobster eye” systems. They are all systems based on two reflections, 
which is necessary for obtaining true images in grazing-incidence configuration, in 
particular to reduce the coma aberration. In this section, after a discussion of the 
parabolic single-reflection system (which is not able to produce real images because 
it is dominated by coma for off-axis rays), the Wolter family configurations will be 
extensively treated. Finally, we will also give some remarks on the KB and lobster 
eye configurations. 


3.1. Optics with Parabolic Profile 


The idea of using the grazing-incidence reflection to produce X-ray optics was 
born in 1960, when Giacconi and Rossi!! considered the possibility of making a 
grazing-incidence X-ray focusing mirror based on a truncated paraboloid shape (see 


Figs. 2‘a) and 9), with its profile simply described by the equation: 
Y? = 4pX (14) 


with p representing the focus-to-vertex distance VF. The parabolic shape was of 
course suggested by its property of concentrating a paraxial beam in the focus 
without suffering spherical aberration. On the other hand, the parabolic shape can- 
not be used to make grazing-incidence telescopes to obtain real images because they 
would be affected by a strong coma aberration, that is, the dependence of the focal 
length on the reflection position when an off-axis beam strikes on it. Coma is a 
typical off-axis aberration, caused by the different magnifications of reflected rays, 


4) Reflecting surface 
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wee P(x;y) radiation 
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Fig. 9. Scheme of a parabolic reflecting concentrator in grazing-incidence configuration. It repre- 
sents the first geometry for an X-ray telescope proposed by Giacconi and Rossi in 1960. 
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depending on the positions at the mirror surface. As a result, the useful field of view 
of the optics would be too small to produce any image in the focal plane. 

A coma-free optic may be obtained, indeed, if the Abbe sine condition is satisfied 
in all points of the reflecting surface: 


hy/sin 0, = hg/sin 62 = constant, (15) 


where h; and hz denote two arbitrary distances of the object and of the image point 
from the optical axis, and 0; and 62 the angles between the optical axis and the ray 
before and after the reflection or refraction process. 

For astronomical objects, all rays may be considered parallel, and so the 
Abbe condition is satisfied if the incident rays intersect the reflected ray direc- 
tions in a spherical surface (called “principal surface”) centered on the focus 
(Fig. 10). 

The Abbe sine condition thus rules out a parabolic shape. Indeed, in this case 
the Abbe surface is simply the paraboloid itself, which approximates a sphere only 
near the vertex (in almost-normal incidence, Fig. 11(a)). While this configuration is 
common and acceptable for optical telescopes, in X-rays it would return an almost- 
zero reflectivity. However, it should be noted that the parabolic configuration has 
been proposed (and also developed at a level of prototypes) for the realization of 
large non-imaging concentrators for hard (> 10 keV) X-ray astronomy based on nat- 
ural crystals (e.g. Highly Oriented Pyrolytic Graphite, or lithium fluoride) reflecting 
via Bragg diffraction with internal mosaic structure to increase the reflecting band 
under the Bragg peak (see Refs. 61-63).° 


Incident Rays 


Abbe Surface 


Fig. 10. Principal surface (spherical) for a coma-free optical system, able to fulfill the Abbe sine 
condition. The principal surface is determined by the intersection of the directions of the incidence 
rays with the direction of the reflected or refracted rays. 


©See also Volume 5, Chapter 1 “Law Lenses in Hard X-ray Astronomy” 
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Fig. 11. Parabolic configuration. (a) The Abbe sine condition is (approximately) fulfilled just in 
the region near the vertex, where the reflection occurs in normal incidence and the reflectivity is 
almost 0. (b) Scheme of a parabolic concentrator based on crystal diffraction from mosaic crystals, 


a system proposed for hard X-ray astronomical applications.¢ See electronic edition for a color 
version of this figure. 
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Fig. 12. Principal surface for an X-ray Wolter optic, defined by the intersection of the prolongation 
of the rays reflected by the first surface with the prolongation of the rays reflected by the second 
surface. See electronic edition for a color version of this figure. 


3.2. Wolter’s Configurations 


The solution to this problem is found by applying the configurations suggested in 
1952 by Hans Wolter for X-ray microscopy applications, based on grazing-incidence 
mirror systems,!° to astronomical X-ray optics. The basis of his considerations was 
that the quality of an imaging device would be better the more closely the Abbe sine 
rule is fulfilled. Since grazing incidence is a requirement for X-rays, reflection by a 
single mirror can never produce an image, whatever shape it has (see the extended 
discussion in Ref. 46). The large aberration for off-axis angles due to coma, as Wolter 
has shown, can be overcome only by utilizing a second reflection on a second mirror 
(Fig. 12). Figure 13 schematically represents the three configurations Wolter studied 


4See also Volume 5, Chapter 1 “Law Lenses in Hard X-ray Astronomy” 
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Fig. 13. Scheme of the Wolter-I (top), Wolter-II (middle) and Wolter-III (bottom) X-ray optics. 
See electronic edition for a color version of this figure. 
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in detail and which are known as the Wolter Type I, Type II and Type III systems. 
For each of the three configurations, the two mirrors are arranged coaxially and 
they have a coincident common focus, which makes the system able to focus and 
to produce real images. The Wolter Type I and Wolter Type II configurations both 
utilize a paraboloid and a hyperboloid. 

In the Type I system, reflection occurs on the internal surfaces of each mirror; 
the reflection is off the external surface of the hyperboloid for the Type II systems. 
In fact, the Type II system is the grazing-incidence analogue of the Cassegrain 
telescope. In Type III system, the incident rays are first reflected from the external 
surface of a paraboloid and then focused by the internal surface of an ellipsoid. 
The main difference among the three systems is the ratio of focal length to total 
system length. The focal length of the Type I system is given by the distance 
from the paraboloid/hyperboloid intersection plane to the focus. Therefore, the 
system length is larger than the focal length by the length of the paraboloid. The 
Type II configuration has a focal length that is larger and can increase the system 
length substantially. The Type III system has the shortest focal length of all three 
configurations, but it is difficult to nest several confocal shells together in order to 
increase the collecting area. All three systems are equivalent in optical performance 
with respect the Abbe sine condition. 

It should be noted that the principal surface is not a sphere, as required to 
perfectly fulfill the Abbe condition, but a paraboloid that is well approximated by 
a sphere in the angular region close to the center of the field of view (see Fig. 14). 
Wolter showed that the sine condition could be approximately fulfilled also for larger 
apertures by introducing hyperboloids, but only for an even number of mirrors. 


LL ' incident rays 


parabola 
true separation 

surface 
hyperbola 


principal surface # 


Fig. 14. Representation of the principal Abbe surface for a Wolter-I optic. It should be noted 
that the principal surface in this case is not really a sphere but instead is a parabolic surface in 
proximity of its vertex (where it is almost spherical). See electronic edition for a color version of 
this figure. 
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Moreover, in a second paper Wolter (see Refs. 64 and 65) took his analysis one step 
further and presented telescopes which exactly obey the sine condition, so that these 
systems are completely free of spherical aberration and coma. This is achieved by 
a very small deviation of the mirror surfaces from a second-order polynomial. The 
exact surface figure has been derived by Wolter by extending to grazing incidence 
the result that Schwarzschild obtained for normal incidence in 1905 (this is known 
as a Wolter—Schwarzschild design). 


3.3. Some Remarks on the Wolter-I Configuration 


The Wolter-I configuration, based on a paraboloid grazing-incidence mirror followed 
by a hyperboloid grazing-incidence mirror, has been the most commonly used so 
far in X-ray astronomy. The focus of the parabolic mirror corresponds to the focus 
of the imaginary second arm of the hyperbolic mirror (see Fig. 13). With respect 
to the other two configurations proposed by Wolter, for astronomical applications 
there are a number of advantages related to the Wolter-I, like e.g. a shorter focal 
length and a consequently more favorable f-number (very important parameters to 
simplify the implementation of an X-ray telescope in a space mission). Moreover, 
there is the possibility to nest together many confocal shells in order to increase 
the effective area and to extend the operational energy band (since each shell is 
characterized by a different incidence angle), as well as the possibility to realize 
monolithic parabolic + hyperbolic shells, with an increase of the mirror stiffness 
and a simplification of the assembly procedures. Finally, Wolter-I systems behave 
like “thin lens” systems, in the sense that the imaging quality is not too sensitive to 
possible tilts between the optics and the nominal optical axis, and this makes the 
implementation of a space mission easier because it allows looser tolerances®’ (this 
property is actually common also to the other kinds of Wolter mirrors, while the 
Kirkpatrick and Baez optics described in the following section does not have it). 

It is easy to show that, for a Wolter-I mirror shell, the on-axis effective area 
Aer depends on the energy and it is expressed by the formula: 


Acg (E) © 80 f LO? R? (E), (16) 


where f is the focal length (assumed as the distance between the focus of the 
hyperbola and the parabola/hyperbola intersection plane), L is the mirror length of 
the parabola, @ is the incidence angle on the mirror of radiation for paraxial X-rays 
of energy F, and R(£) is the reflectivity of the mirror as a function of the energy of 
incoming X-rays. In particular, the area depends on the squared reflectivity because 
the rays are reflected two times at almost the same angle. The radius r of the mirror 
at the intersection plane between parabola and hyperbola is determined by the focal 
length and the incidence angle:®° 


r= ftan4d, (17) 
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(a) (b) 


Fig. 15. (a) Comparison of the collecting area achievable with a normal incidence mirror with a 
Wolter-I mirror of the same external diameter and a Wolter I mirror system with same external 
diameter but based on many nested confocal shells. The latter systems utilize only the annulus 
between the indicated inner and outer radii, which is further obstructed by the mirror shell thick- 
ness in the lower example. (b) The X-ray telescope mirrors of the Swift mission during calibration; 
the optics utilizes 12 nested mirror shells in order to increase the collecting area. 


which can be derived from simple geometrical considerations. The expression of the 
off-axis area of a Wolter-I shell is more complex than Eq. (16). A detailed treatment 
is reported in Refs. 69 and 70. 

It should be noted that the effective area of a single Wolter-I shell calculated 
with Eq. (16) is very tiny, in particular if compared with a normal incidence mirror 
(working in the visible band) with the same diameter (see Fig. 15‘a)). Apart from 
the reflectivity term, the collecting area in the case of the normal incidence mirror is 
proportional to the square of the diameter 2 : Acou = (7/4)y3, while the collection 
area of a Wolter-I telescope is the area of the thin entrance annulus, which is propor- 
tional to the difference of the squares of the diameters of the parabola at the top and 
at the intersection plane: Acon = (7/4)(3—¢?), where ¢ is only slightly larger than 
1 for a grazing-incidence telescope. The only way to increase the collecting area 
of a Wolter-I telescope is to nest together several confocal shells (see, e.g. Refs. 68 
and 70), as shown in Fig. 15‘b), where the Swift X-ray telescope mirror formed by 
12 confocal mirrors is shown. In this case, the total effective area corresponds to the 
sum of the effective areas of each mirror shell, and the collecting area is proportional 
to the difference of the squares of the maximum and minimum diameters. 
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A systematic assessment study of the Wolter-I optics in the case of astronomical 
applications is reported in detail in a fundamental paper by Van Speybroeck and 
Chase.”! In this work the authors also studied, for the first time, the angular resolu- 
tion and effective area on-axis and off-axis by means of ray tracing. The maximum 
field of view of Wolter-I mirrors is of the order of +/— the incidence angle 0. One 
of the most important results of the study reported in Ref. 70 regarded the analysis 
of the blur circle radius, sp, of the Point Spread Function, which can be described 
by a simple empirical relation obtained after the ray-tracing of different telescope 
configurations: 


sp + 0.2 
tan a 


2 
= (5) + 4tan 0 tan? a, (18) 
with a being the angle of the incoming X-rays with respect to the axial direction. 
We note that the second term of Eq. 18 is due to coma, since the Wolter-I geometry 
does not exactly satisfy the Abbe sine condition.© The first term of Eq. (18), on 
the other hand, accounts for remaining aberration effects, mainly due to spherical 
aberration and field curvature. This term is also present for Wolter—Schwarzschild 
systems and importantly depends on the ratio between the length of the mirror and 
the focal length and on the off-axis incidence angle. 


3.4. A More General Representation of the Mirror Configurations 
Deriving from Wolter-I 


Apart from the Wolter—Schwarzschild configuration, a pair of conical mirrors’? 


represents another well-known configuration derived from a simplification of the 
Wolter-I configuration. This design is important because it may simplify the fabri- 
cation of the mirrors. The price paid for this simplicity is the existence of intrinsic 
blur, in agreement with the equation: 


a (19) 
8f 
where H is the half-energy width, L is the length of the first cone, r is the radius 
and f is the focal length. 

By the way, more general mirror designs than Wolter’s exist in which the pri- 
mary and secondary mirror surfaces are expanded as a power series. These poly- 
nomial solutions are well suited for optimization purposes, which may be used to 
increase the angular resolution at large off-axis positions, while degrading the on- 
axis performance. The idea is to transfer the principle of the Ritchey—Chrétien 
Cassegrain telescope, widely used in optical astronomy, to grazing-incidence optics. 
By deliberately compromising the on-axis performances, one can introduce aberra- 
tions (mainly spherical) that tend to cancel or reduce the off-axis aberrations. 


“It can be demonstrated that this term disappears for the Wolter-Schwarzschild configuration. 
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Reference 73 discussed polynomial optics solutions previously proposed by 
Ref. 74 and described in detail a wide-field X-ray telescope design with good angu- 
lar resolution up to ~30 arcmin. References 75 and 76 sketched a procedure aimed 
at optimizing the entire mirror assembly based on many different nested confocal 
shells. 

The X-ray telescopes derived from the Wolter-I configuration that can be 
designed in terms of polynomials are described by a set of parameters that charac- 
terize the mirror surfaces (like for the Wolter-I). In particular, in agreement with 
Fig. 16, the main parameters are the radius of the mirror shell at the intersection 
plane, ro, and the telescope focal length, f (which is assumed to be the distance 
between the intersection plane and the focus). The angles @ and £ are the angles 
between the primary and secondary mirror tangents at the intersection plane and 
the optical axis, respectively, while the length of the primary and secondary mirror 
will be X; and X92. A shell scheme is depicted in Fig. 16, with the origin of the 
Cartesian system at the intersection plane and the X-axis along the optical axis of 
the telescope to the source. In this reference system, the primary and secondary 
mirror squared profiles, r7(a1) and r3(a2), respectively, can then be expanded as 
power series, where r and «x are radial and axial coordinates of the primary and 
secondary surfaces, respectively: 
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Fig. 16. Geometry for the optimization of two-reflection polynomial mirrors derived from the 
Wolter-I configuration. It should be noted that the origin of the coordinates is given by the optical 
axis intersection with the intersection plane between parabola and hyperbola. See electronic edition 
for a color version of this figure. 
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Table 2. Coefficients of the polynomials for X-ray grazing-incidence mirrors. 


Telescope Configuration ag bo ay by az ba ag bg a4 ba 
Parabolic i 1 -—2tand —2tan§ 0 0 0 0 0 0 
Double-cone 1 1 -2tan@ -—2tanG tan?6 tan? 3 0 0 0 0 
Wolter-I 1 1 —2tan@ —2tang o tens 0 0 0 0 


The highest reflection efficiency is achieved for 8 = 36 and a focal length f = 
ro/(tan 40).°" By definition, a9 = bp = 1, while aj = —2 tan @ and b; = —2 tan are 
twice the slope of the primary and the secondary surfaces at the intersection plane. 
By selecting different values for the other coefficients, one obtains the different 
optical designs considered so far (Table 2). 

We note that working directly on the mirror properties provides a more powerful 
tool to improve the image quality over the field of view. Polynomial surfaces are 
particularly well suited for optimizing purposes, since computing procedures can 
operate iteratively on the coefficients of the power series expansion. This can be 
done by defining a merit function and by finding its minimum in the coefficient 
parameter space, after specifying a minimization goal. These criteria can either 
provide the best image on-axis with only a modest improvement off-axis (i.e. com- 
promising the on-axis performances by a given fraction), or the flattest response 
over the entire field of view. As an example of this optimization we show in Fig. 17 
a design recently achieved at the Brera Astronomical Observatory — INAF in the 
context of the Wide-Field X-ray Telescope project, aiming at the implementation 
of a wide-field mission with a corrected field of 1° in diameter, specifically designed 
for performing surveying observations.”’ This figure shows the angular resolution 
expected, in terms of half-energy width, as a function of the off-axis angle for a 
telescope with a focal length of 5.5m and mirrors with maximum diameter and 
total length of 1.2m and 0.4m, respectively, assuming an optimized polynomial 
and a Wolter-I profile. For comparison, the behavior of the HEW for the Chan- 
dra mirrors (Wolter-I profile, focal length 10m, max diameter 1.2 m, mirrors total 
length 83cm) is also shown. The polynomial mirrors exhibit the best performance 
across the field, thanks to the optimized parameters of the reflecting surface and of 
optimized lengths of the shells. 


3.5. Remarks on Kirkpatrick-Baez Optics 


The Kirkpatrick—Baez (KB) system was the first kind of optics used to obtain 
two-dimensional images by reflection at glancing angles (see Refs. 9 and 10); other 
previous imaging experiments in X-rays had been carried out (see, e.g. Ref. 2), but 
using diffraction from natural crystals instead of total external reflection mirrors. 
The simplest configuration ideally makes use of two crossed cylindrical mirrors 
with identical radius of curvature (Fig. 18a), but several pairs of parabolic mirrors 
curved just in one direction, with the meridian planes coincident, may be also nested 
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Fig. 17. Comparison, in terms of angular resolution (HEW) across the field of view, among 
telescopes with different designs (see the text for details). See electronic edition for a color version 
of this figure (figure credits: Marta Civitani, INAF-OAB). 


together to increase the collecting area (Fig. 180). After the first reflection, the 
incident rays are concentrated in linear focal images. When the X-rays are reflected 
the second time from the next one-dimensional parabolic mirror (oriented at a right 
angle to the first one), a point-like image in the focus is achieved. It is therefore 
possible to obtain images of extended objects without being affected by astigmatism. 
Since the pairs of mirrors are not coincident, the object distance for the sagittal 
reflection in the second mirror is larger than that for the meridian reflection in the 
first. 

Thus, the magnification in the two directions is different, i.e. an optical anamor- 
phism takes place.*” It is possible to design systems in which both mirrors are at 
the same distance from the object. The exact solution for the intersection point 
with the focal plane of an arbitrary incident ray is given in Ref. 78. It should be 
noted that KB systems based on just two reflections are quite importantly affected 
by coma, since the Abbe sine condition is satisfied just in one direction (coma would 
be avoided only in the case of four reflections). 

A detailed configuration and analysis of the multi-plate KB system for astron- 
omy is given in Ref. 79. We note that the system is still largely used for X-ray 
microscopy applications (see, e.g. Ref. 80). In astronomy, the system is attractive 
because the fabrication of the mirrors is easier than for Wolter optics. Indeed, the 
plates need a curvature in just one direction and, in practice, spherical instead than 
parabolic mirrors can be used since sagittal focal lengths are much greater than the 
meridian focal length. Moreover, since the plates are curved just in one direction, 
the system is particularly well suited for the application of active correction meth- 
ods with actuators during the assembling phase (as reported in Ref. 81). However, 
there are a number of problems that make the system less appealing than Wolter-I 
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(b) 


Fig. 18. Schematic of the KB telescope. Image (a) shows a two-mirror combination, and 
(b) displays a stack of several mirrors to increase collecting area (figure credits: Stefano Basso, 
INAF-OAB). 


optics. In particular, KB systems are not only characterized by imaging capabilities 
over a smaller field than Wolter ones, but they also do not behave as “thin lenses” 
and the images are much more affected by alignment problems due to tilt errors of 
the mirrors with respect to the central nominal optical axis. Last but not least, the 
maximum diameter for a given grazing-incidence angle achievable with a KB system 
is about a factor of two lower than a Wolter-I optic for the same focal length, with 
a consequently much smaller effective area.“ KB mirrors are, however, commonly 
utilized in synchrotron or free electron laser radiation beamlines, where the copious 
X-ray flux does not require a large collecting area (see, e.g. Ref. 82 for a detailed 
description). 


3.6. Remarks on Lobster Eye Optics 


Lobster eye optics! are similar to the real eye optical systems of shellfish like lob- 
sters and shrimps,®* which need sensitive vision coupled to a large field of view in 
dark sea-beds. So, unlike the eyes of flies (which are in practice collimators dis- 
tributed around a spherical sensitive area), in lobster eyes grazing-incidence reflect- 
ing channels are distributed around a spherical surface and focus on different points 
(depending on the direction of rays) of a sensitive spherical surface placed at radius 
one-half that of the radius of curvature of the eye.® Since the mirrors work in 


fSee Chapter 5 “Lobster Eye Optics” of this volume. 
&In practice, the system is the equivalent to a spherical mirror working in transmission. 
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Fig. 19. Lateral section of a lobster eye telescope. See electronic edition for a color version of this 
figure. 


grazing-incidence geometry, the system can be adopted also for focusing X-rays via 
total reflection (Fig. 19). 

Therefore, while the Wolter and the KB systems have in common a relatively 
narrow field of view that is practically limited to the grazing angle employed on the 
individual mirrors, in principle the lobster eye system can have a field of view of 
4 steradians, without any preferential direction.*° The price to pay is a reduced 
angular resolution and smaller concentration power. 

Two configurations for the application in X-ray astronomy have been proposed, 
the so-called Schmidt focusing collimator objective$+ and the Angel multi-channel 
lens.*° The principal layout of Schmidt’s design is somewhat similar to the KB 
system. The upper and lower stack indeed consist of a series of plane mirrors in an 
orthogonal configuration. Each mirror is furthermore assumed to be reflecting on 
both sides. The mirrors within each stack are arranged such that the envelope of the 
upper edges forms a section of a cylinder, so that the center lines of the two cylinders 
are at right angles, with their intersection point being the origin of the coordinate 
system. Each mirror plane is within the radial direction originating from the cylinder 
center line of the corresponding stack. Therefore, each stack provides focusing in one 
dimension onto a cylindrical surface located about halfway between the center lines 
and the stack position. The focusing is not perfect because of the finite height of the 
mirror blades along the radial direction. The Angel configuration is instead based 
on a number of square hollow micro-channels disposed in such a way as to cover a 
spherical surface. A real focus happens only in the case when the ray is reflected by 
two orthogonal sides of a channel. The PSF of both Schmidt’s and Angel’s systems 
are characterized by a typical cross-like distribution of the photons, the four arms 
of the cross due to the photons reflected just by a single surface. Prototypes of both 
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configurations have been successfully produced (see, e.g. Refs. 86 and 87). To date, 
lobster eye X-ray optics have only been used on the MIXS instrument of the ESA 
BepiColombo mission to the planet Mercury (launched on 20 October 2018). They 
may also play a major role in future X-ray missions such as THESEUS®® (recently 
proposed to ESA in the context of the M4 Cosmic Vision program) or TAP (the 
Transient Astrophysics Probe, currently being developed for proposal to NASA), 
both of which are devoted to the monitoring of the sky for finding transients like 
cosmological Gamma-Ray Bursts or compact objects in flaring status. 
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This chapter discusses the production of silicon pore optics, a novel class of 
lightweight high energy optics made from high quality 12” silicon wafers. The 
technology is being developed by the European Space Agency to enable the next 
generation of X-ray observatories, such as the Athena mission planned for launch 
in 2033. Silicon Pore Optics will allow building a high-resolution telescope with 
several square meters of effective area. We will show the production steps from 
silicon up to final optic, present design parameters for future applications and 
discuss other applications for this technology. 


1. Introduction 


Chandra and XMM-Newton, both launched in 1999, are the largest X-ray observa- 
tories ever built and flown. Yet, even before their launch, a workshop! took place in 
1997 at Leicester University, to discuss the requirements for a future high energy X- 
ray telescope. Such an observatory would need to be a radical step forward in terms 
of performance and should combine both high angular resolution and a large collect- 
ing area in order to match the performance of ground- and space-based telescopes 
operating in other parts of the electromagnetic spectrum. The workshop resulted in 
a preliminary mission design termed XEUS,?:? with a focal length of 50m, 30m? 
of effective area at 1 keV and an angular resolution better than 2 arcseconds. 
XMM-Newton has an effective area of 0.15m? and an angular resolution of 
about 13 arcseconds, and was built using nested replicated Nickel shells.4 Chandra® 
consist of four shells of super-polished thick glass mirrors, has a focal length of 10 m, 
an angular resolution of 0.5 arcsecond and an effective area of 0.04m?. Figure 1, 
which shows the performance of those and other X-ray telescopes flown to date, 
immediately highlights the technical challenge. To build XEUS would require an 
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Fig. 1. X-ray telescopes flown to date and their performance in terms of angular resolution and 
effective collecting area. Figure courtesy D. Willingale. 


optic with 200 times the effective area of XMM, while improving the resolution by 
an order of magnitude. The available technologies would either provide high angular 
resolution, but would be photon starved resulting in long observation times, or they 
would have a relatively large collecting area at the cost of angular resolution. To 
reach the so-called Golden Quadrant, where XEUS and now Athena® are located, 
hence requires either a vast improvement of existing technologies or a radically new 
approach. The sheer size of XEUS also called for a mass production approach in 
order to produce such an optic within a reasonable timescale. 

X-ray optics for space applications, operating at from a few hundreds of eV 
up to tens of keV, are usually based on grazing-incidence mirrors that need to be 
fabricated and mounted. The mirrors need to have a roughness of a few Angstroms 
to efficiently reflect the X-rays and must have an accurate figure and be co-aligned 
to achieve high angular resolution. The effective area requirement results in optical 
designs that maximize the number of mirrors, while the mass limitations imposed 
by the launcher demand very thin mirrors. 

Several different approaches are being studied to solve this problem, using 
slumped glass,’ silicon foils,!° glass or silicon micro-pores.'' All approaches 
have in common that they need to develop both the X-ray mirror itself and also 
the mounting method. Making a high-quality mirror is already a formidable task. 
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Finding a method to mount very thin mirrors without distorting them and then 
launching that structure into space is at least equally problematic. 

This chapter focuses on a radically new approach to develop a new X-ray optics 
technology that is very lightweight, yet stiff. The technology is called Silicon Pore 
Optics (SPO) and is being developed by the European Space Agency (ESA). The 
semiconductor industry has invested heavily into making silicon wafers, normally 
used to produce electronics and memory chips, as used for example in smart phones. 
Due to the unique properties of the wafers they can, however, also be used as perfect 
X-ray mirrors. Many suppliers exist that deliver mass production equipment using 
standardized processes to structure, etch, cut, coat and transport silicon wafers. 
In addition, the silicon itself can be used to also solve the mounting problem, by 
bonding the silicon mirrors to each other, creating stiff blocks of several tens of 
mirrors. 


2. SPO Production Process 


Silicon Pore Optics use double-side-polished crystalline silicon wafers as base mate- 
rial for the X-ray mirrors. 

The wafers are diced (cut) into mirror plates of the desired rectangular shape. 
Each plate is wedged, as required to make a focusing optic. The wedged plates are 
ribbed, leaving a thin membrane and a number of ribs that will serve as interconnects 
to the next plate and which stiffen the thin membrane. The plates are then elastically 
bent and stacked on top of each other by bonding them together. This forms a three- 
dimensional (3D) structure having pores, through which the X-rays can reflect off 
the membranes. This 3D structure is a so-called stack. The stack is very stiff and 
very lightweight, and determines the quality of the resulting optic. To form an 
imaging system two such stacks are then co-aligned and glued to mounting brackets, 
resulting in a so-called mirror module that is essentially an X-ray lenslet. Hundreds 
of such mirror modules are then integrated into an optical bench, forming the final 
X-ray optic (see Fig. 2). 


2.1. Silicon Wafers — The Base Material for SPO 


The production of electronics-grade silicon wafers starts from polycrystalline silicon, 
which can for example be generated in the so-called Siemens-Process from a reaction 
of gaseous SiHCl3 and hydrogen. It takes about a week to grow polycrystalline 
silicon rods, being a high-purity form of silicon with impurities in the parts-per- 
billion (ppb) level. These rods are cut into small chips and are then put into a 
quartz crucible, which heats the polycrystalline silicon using a graphite heater to 
1420°C, slightly higher than the melting temperature of silicon. The growth of a 
silicon ingot is then initiated by placing a seed crystal with some few mm diameter, 
glued in a precise orientation on a rod, onto the surface of the silicon melt. The 
ingot and crucible are slowly rotated and the seed crystal is pulled up, drawing 
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Fig. 2. Overview of the SPO production process. 


some molten silicon with it that crystallizes and extends from the seed crystal. This 
method was developed in 1916 by Polish scientist Jan Czoschralski, and the process 
is named after him. 

The solidified single crystalline silicon extends from the seed crystal first into 
a cone shape and by controlling the temperature, pulling rate and rotation speed 
one can stabilize the process to form a cylindrical ingot. The largest diameter the 
industry can produce to date are 12” (300 mm) ingots, with the next generation of 
18” (450mm) being lined up for 2018. The ingots are grown to a length of 1-2m, 
which means that a few hundred kg of silicon are hanging from the seed crystal at 
the end of the process (see Fig. 3). 

After ingot growth a number of mechanical operations are performed. First one 
cuts off the seed and tail cone, then one uses a grinding wheel to give the ingot 
a uniform diameter, cuts the ingot into sections and finally one grinds a notch to 
serve as indicator for the (110) crystal plane orientation, important for later dicing 
and edging steps. The ingot is then mounted in a multi-wire slicing machine, which 
uses a 200-ym thin wire and abrasive slurry to dice the ingot in one run into thin 
wafers. The silicon industry offers different thicknesses; however the highest grade 
product in terms of thickness uniformity is a 0.775-mm wafer, which forms also the 
basis for SPO. 

The wafers are then ground several times to improve the thickness uniformity, 
which is characterized by the total thickness variation (TTV). After rounding the 
edges and an etching step to remove surface layer damage one loads up to 15 wafers 
at the same time into a double side polishing (DSP) machine, which reduces the 
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Fig. 3. Single crystalline silicon ingot hanging from the seed crystal. Photo copyright Siltronic 
AG. Reproduced here with permission. 


roughness of the wafer down to better than 0.1nm. The DSP machines are able to 
produce large batches of wafers with the same absolute thickness, with a standard 
deviation of less than 0.3m. Within each wafer the TTV is less than 300nm 
(peak-to-valley) and typically 30nm on the central 90% of the area. Each wafer has 
a unique ID and one can obtain data sets per wafer with thickness maps. 

This process results in almost perfect single crystalline silicon disks, which are 
thin, smooth, plane parallel and very clean — the perfect mirror (see Fig. 4). 


2.2. From Wafer to Plate 


Silicon wafers need to be further processed into plates before they can be stacked 
into SPO. The plates will have two main functional parts: a membrane, acting as the 
mirror, and ribs to interconnect multiple plates. In addition one has the possibility 
to add a wedge to the plate, depending on the type of optics one wants to later 
build. For example, to make Wolter-I type!® optics, a wedge must be added to each 
pair of plates to make the angle at which on axis X-rays meet the primary mirror at 
each radius comply with the Wolter-I configuration and its approximations. Other 
types of optics, such as Kirkpatrick—Baez,!" require no wedge. 

For wedged plates the process starts with wet thermal oxidation of the 300mm 
silicon wafers, which is done in a furnace at about 1000°C by a reaction of ultra- 
high purity water vapor with the silicon. The maximum achievable thickness is a 
~2400-nm thick oxide silicon layer, grown into the silicon on both sides of the wafer, 
with a layer thickness uniformity of about 1%. 
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Fig. 4. Three 300mm diameter, double-sided, polished silicon wafers. Photo copyright Siltronic 
AG. Reproduced here with permission. 
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Fig. 5. Left: a 12” silicon wafer. Right: 12’’ wafer diced into plates. The dicing process can also 
directly cut grooves into the plate, resulting in a diced and ribbed silicon plate. 


The wafer is then put onto frames and diced in the (100) direction into rect- 
angular plates of the size needed to support the design of the optic (see Sec. 3). 
The dimensions range from 20 x 90mm? to 50 x 110mm? plates for the case of 
Athena. The dicing is done using standard, fully automated, semiconductor dicing 
saws, which use dicing blades and can process entire wafers at once, from loading, 
through alignment, dicing, and up to washing and drying (Fig. 5). 

The same machine can also perform the ribbing process, where grooves are cut 
into the silicon plate, leaving a thin membrane (the actual X-ray mirror) and a 
number of ribs. The grooves will later on form the empty space through which the 
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Fig. 6. A diced plate has a ribbed side and a reflective side. The ribs are formed by a grooving 
process using a dicing machine. The membrane is left at the bottom of the groove. 


X-rays pass. Again, the dimensions depend on the optical design, but typically one 
leaves a 150-yum thin membrane with 170-ym wide ribs spaced by 1-3 mm. 

After removing the plate from the dicing frame, each plate gets a unique identifi- 
cation imprinted on the side with a laser engraving tool, to preserve the position and 
orientation of the plate with respect to the wafer. The plates are then anisotropically 
etched in a potassium hydroxide (KOH) solution to remove possible mechanical 
damage (micro cracks). This removes about 20 j:m of silicon on all sides, except 
for the top of the ribs and the reflective sides, which are covered by a protective 
coating. This step also increases the roughness on the inside of the pores, which is 
desired as it reduces scattering of X-rays that could later cause stray light effects. 

In the next process step the plates are being wedged. This is a wet chemical 
process especially developed for SPO, where the silicon oxide layer is gradually 
removed from both sides of the ribbed plate as a function of position along the 
plate length. An ellipsometer is used to measure the uniformity of the wedge layer 
(Fig. 6). 

If a metal coating is required to enhance the reflectivity of the silicon plates, then 
a photo resistive coating is applied using standard photolithographic techniques to 
protect the areas required for later bonding before the coating process. After metal 
coating!® a lift-off process is applied, which removes the resist and thus the excess 
coating from the bonding areas, and which leaves the metal coating elsewhere intact 
(Fig. 7). 


2.3. Stacking of Plates 


The stacking process is the core of the SPO technology. In this step the plate is 
prepared for bonding, bent to the desired figure and then bonded on top of other 
plates. The important point is that the plates are bonded directly, without any glue. 
This technique originally stems from optical bonding techniques to join lenses or 
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Fig. 7. Photo of the ribbed side of a silicon plate with a width of 49mm, a length of 110 mm and 
an average rib pitch of 2.3mm. 


prisms together. Many bonding techniques, such as for example anodic,!? eutectic 
or hydrophobic bonding exist; however, in the case of our oxide coated silicon plates 
we use direct hydrophilic bonding”® of activated surfaces. 

The silicon plates are first cleaned to remove any residual particulate contam- 
ination, preferable down to a size of 100nm. Then the bonding surfaces must be 
activated. This is done either by wet activation using a dip of the ribbed plate in a 
set of chemicals or by using ambient plasma activation. Both methods result in a 
hydrophilic silicon plate and a rib surface that is covered in hydroxyl groups. When 
two of such plates are brought into close contact at room temperature then a nat- 
ural water interlayer joins the hydroxyl groups,”! a so-called pre-bond, by means 
of Van der Waals forces. During annealing at elevated temperatures (>110°C) a 
polymerization reaction takes place that drives the water out and produces higher 
strength covalent bonds. 

The cleaned and activated silicon plates are loaded into a fully automated stack- 
ing robot, which we developed especially for SPO production. The stacking robot 
operates in a class 100 clean room and fits onto an optical table. The plates are 
inspected for cleanliness and are then brought by a robotic arm to a pre-bending 
station, the so-called die. This is a convex surface that brings the plate into the 
approximate shape required for the stack, with the ribs hanging downwards. 

The robot then aligns a concave silicon mandrel to the plate on the die. Unlike 
other X-ray optics, the mandrels are responsible only for the figure of a stack; 
the roughness of the plates is not influenced. A mandrel has either a cylindrical, 
conical or even a secondary curvature figure that the stack will copy. Mandrels are 
marginally larger in size than the plates and can be cleaned and recycled without 
refurbishment. A mandrel is produced by coarse grinding and ion beam figuring 
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of a silicon block, which can be done by a number of companies, and which is a 
process routinely used to produce synchrotron optics with a residual figure error 
of 0.1 arcsecond. Note that the silicon mandrel material matches the coefficient of 
thermal expansion (CTE) of the stack. 

To start a stack, the mandrel is equipped with a so-called base plate. The robot 
then presses the plate hanging on the die onto the base-plate and, after release of 
the plate from the die, a small stack of two plates is formed. The figure of the stack 
is measured using interferometric metrology. The robot then picks the next plate, 
pre-bends it using the die and repeats the bonding process. This cycle continues 
until a stack of 20-45 plates is reached, depending on the optics design. Note that 
the mandrel only determines the outermost radius of the stack. The stack is then 
removed from the mandrel and forms a free-standing, light-weight, stiff silicon block 
with a large open area ratio (see Fig. 8). 


2.4. Mirror Module Assembly 


Double reflection of the X-rays on two stacks aligned behind each other is required 
to build an imaging system. To this end two stacks are placed behind each other 
in an X-ray test facility such as the BESSY II synchrotron in Berlin.?? The optics 
are illuminated and the position of the second stack with respect to the first one 


SS" 


Fig. 8. A stack of 35 silicon plates including coating. The stack has dimensions of 66 x 66 mm?, 


an outer radius of curvature of 0.74 m and consists of ~2100 pores with a rib and membrane width 
of 0.17 mm, resulting in an open area ratio of more than 60%. 
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is optimized to maximize transmission, i.e. alignment of the pores. One then sets 
the so-called kink-angle between the two stacks, to bring on-axis X-rays from the 
outer radial position, determined during stack production by the mandrel, to the 
common focal plane. 

The accuracy in setting this angle is critical and is typically of the order of 1 arc- 
second. The alignment is fixed in place by two metallic side plates that are mounted 
on the outside, termed brackets. The brackets are specially designed interface plates 
that have a CTE match to the silicon stacks and that provide an interface for later 
optical bench integration. Glue is injected through the brackets to join them with 
the stacks and, after hardening, a mirror module is made (see Fig. 9). 

It is important to realize that the mirror module is now an X-ray lens, which 
has relaxed integration requirements compared to mounting individual shells. In 
addition, one can compensate for residual kink-angle errors by radially moving the 
mirror module during petal integration. Other integration methods using ultraviolet 
or visible light could be envisaged as well; however, for Athena we currently give 
preference to the X-ray assembly method, as we have very good experience with 
the stability of the BESSY synchrotron and as we can directly test the final optic 
before and after assembly. 

The brackets have three interfaces to so-called dowel-pins, which form an iso- 
static mount to the optical bench. The exact design of the dowel pins can be adopted 
to the specific launch load requirements. The brackets can also have an interface for 
an external baffle, which for test purposes we have fabricated from silicon as well. 


Fig. 9. An SPO mirror module consisting of two sets of co-aligned stacks glued into their mounting 
brackets. 
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3. SPO Design Parameters 


This chapter discusses the mechanical design parameters and limitations one has to 
take into account when designing an SPO optic. For information on how to optimize 
a high-energy optics design the reader is referred to other sources such as Ref. 23. 

Typically the overall design of an SPO optic is driven by the wafer thickness, 
which, for the highest quality 12” wafers, is 0.775mm. Wafers with different thick- 
ness can be procured, but usually have lower specs in terms of TTV and surface 
roughness. This thickness defines the spacing of the mirrors. 

The dicing process defines the width and length of the mirror and is limited 
by the diameter of 12” wafers. The length of a plate (typically between 20 and 
120mm) is usually given by the optics design and is determined from the focal 
length of the optics and the radial position of the plate within the optic, taking into 
account vignetting and stray-light considerations. The plate length is usually kept 
constant within a stack, as this greatly reduces the number of different plates that 
one has to handle during production. The length of a plate also drives the absolute 
wedge height and vice versa. We currently can produce a wedge of a maximum of 
about 2000nm per side of a plate. How to best divide the wedge over a primary 
and secondary set of mirrors is a trade-off performed by the optics designer. 

The width of a plate is mainly driven by mechanical considerations, such as the 
packing density within the optic, or the total azimuthal sag that the preforming 
die has to apply to the plate before bonding. The plate width also drives man- 
drel and die dimensions. For inner radii plates (r ~ 250mm) the plate width is 
typically about 50mm, whereas for outer radii we can stack up to 100mm wide 
plates. 

The grooving process determines the rib width, the rib spacing and the mem- 
brane thickness or in other words, the shape of the pore. Note that the rib spacing 
does not have to be constant and can vary across the width of a plate. It is also possi- 
ble to slightly rotate the ribs with respect to each other, to optimize the throughput 
of two stacks aligned behind each other. The most important parameter is usually 
the membrane thickness. Mechanically it determines the strain energy stored within 
a stack as a result of bending the plate, as the bending moment is proportional to 
the third power of the membrane thickness. The thinner the membrane, the easier 
it is to bend and bond the plate. However, if it becomes too thin (less than about 
120 um) then the production yield drops rapidly. Optically the membrane thickness 
mainly drives the pore height and thus the on-axis effective area (Fig. 10). 

The number of ribs, their width and position, and the overlap of ribs in adjacent 
mirrors stacked on top of each other is again a trade-off between bond quality (more 
ribs provide more bond area and thus stiffness), off-axis effective area (fewer ribs 
is better) and production yield (the width of the grooves directly relates to dicing 
blade wear and production time). Typically, we have a rib spacing of 1-3mm and 
a rib width of about 0.17 mm. 
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Fig. 10. Cross-section of a silicon plate, defining the design parameters: A = rib pitch (can be 
variable across the plate), B = beginning rib width (can be variable across the plate), C = pore 
end height, D = membrane thickness, F = plate thickness, H = end rib width. 


The maximum and minimum bending radii are firstly given by the mechan- 
ical limits discussed above. However, they do not only depend on the mechanical 
properties but also on the optical design. For instance, while we could bend 0.17 mm 
thin silicon to radii of about 50mm before a plate would break, an optimized opti- 
cal design for a focal length of more than 10m could demand a plate length at 
this radius that exceeds the wafer dimensions, due to the very small graze angles 
required. Similar restrictions hold in this example for outer radii plates, where the 
bending moment is again not the issue, but the plates become very short, down to 
a limit where they become difficult to handle. 

The angular resolution of an optic is in the first instance defined by the image 
created by a pore,?+ whose figure is determined by the shape of the mandrel. If 
the pore is planar, then the image in the focal plane is as wide and high as the 
pore. If the pore is bent in the azimuthal direction (cylindrical or conical shapes are 
possible) then the image is focused azimuthally and resembles a line. If the pore is 
also bent in the longitudinal direction (secondary curvature) then the line can be 
reduced to a point-like focus. The images of all the pores together form the final 
point spread function (PSF) of the optic. For long (several tens of meters) focal 
length optics a conical shape of the plate is sufficient to achieve a resolution of a 
few arc seconds, as the systematic error term is small in comparison to other error 
sources (e.g. wedging errors, alignment errors with the modules or stability of the 
optical bench). The shorter the focal length, the more the planar approximation to 
a true Wolter-I design drives the error budget, and one then needs to add secondary 
curvature. In the case of multiple reflections it is sufficient to curve only one of the 
mirrors. The curvature itself is small, typically a few hundred nanometers over the 
length of the plate. 


4. Other Applications for SPO Technology 


Silicon Pore Optics is a new technology and most of the design parameters and 
limitations discussed above are a result of the need to develop an optic enabling 
the next generation X-ray telescope. This means that the available parameter space 
and production limits (e.g. membrane thickness, bond strength) have not fully been 
researched, leaving room for new ideas exploiting this technology. 
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One such example is a soft y-ray Laue lens, made using SPO technology and 
named Silicon Laue lens Components (SiLC). Thin silicon plates without ribs are 
bent and bonded to fabricate a single crystal with radially curved crystal planes, 
which strongly improves the focusing properties of a Laue lens. The size of the focal 
spot is no longer determined by the size of the individual single crystals, but by 
the accuracy of the applied curvature, which is as low as a few seconds of arc.?° By 
adding a wedge one can obtain crystals that are confocal in the radial direction. A 
secondary curvature in the axial direction can be used to improve the reflectivity of 
each crystal, and increase the reflected energy bandwidth. In the ongoing technology 
development we have fabricated a technology demonstrator designed for 125 keV 
radiation, with a 3.4-m focal length and 600mm? frontal area. 

Other examples are the application of SPO technology to form a slatted mirror 
for an X-ray interferometer,”° the possible use of SPO for neutron optics, the pos- 
sibility to form 3D structures from silicon, and the use of high energy optics in the 
medial and material analysis business. 
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The utility of an astronomical X-ray telescope, like that of a telescope of any 
other band, is largely determined by its angular resolution, photon collecting 
area, field of view, and energy bandwidth. Current and past X-ray telescopes, 
such as Chandra, XMM-Newton, Suzaku, and NuSTAR, have optimized for one 
or more of these four qualities at the expense of the others. The future of X-ray 
astrophysics depends on the realization of these qualities to the largest extent pos- 
sible in a single telescope at an affordable cost. The next-generation X-ray optics 
development effort at the NASA Goddard Space Flight Center, which we describe 
in this chapter, has as its objective to develop a process of making X-ray mirror 
assemblies that keep Suzaku’s light weight and low cost while attempting to first 
achieve and then exceed Chandra’s sub-arcsecond angular resolution. We have 
adopted an implementation method based on precision-polished mono-crystalline 
silicon mirror segments, which are aligned and integrated into meta-shells as an 
intermediary step. These meta-shells in turn will be aligned and integrated to 
become a mirror assembly. This approach simplifies the construction of a large 
mirror assembly into the building and testing and then aligning and integrating 
of several meta-shells. The core of our effort is the development and maturation of 
techniques to build and test meta-shells, including fabrication, coating, alignment, 
and bonding of lightweight and thin mirror segments. As of 2017, we have made 
lightweight mirror segments using mono-crystalline silicon that are similar to 
Chandra’s mirrors in image quality. We have constructed and tested small mirror 
modules that, under full illumination, produce 2.2 arcsecond X-ray images (half 
power diameter), which is limited by errors in alignment and bonding procedures. 
We expect that the image quality will continue to improve, reaching and surpass- 
ing Chandra’s 0.5 arcseconds in the early 2020s. Through finite element analysis 
and targeted environmental tests of mechanical mockups, we have also validated 
the meta-shell approach to building large X-ray mirror assemblies. 
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1. Introduction 


The importance of X-ray optics was recognized before the discovery of extra-solar 
X-rays.! Since then many people at many institutions have developed progressively 
better X-ray optics for astronomical use, with emphasis on ever larger photon- 
collecting area and better angular resolution.* 

The techniques that have been used to produce X-ray telescopes can be catego- 
rized in several ways. Categorized by mechanical or structural implementation, there 
are full-shell implementations such as Einstein,? EXOSAT,? ROSAT,* Chandra,° 
XMM-Newton,® and Swift/XRT’ and segmented implementations such as ASCA,® 
Suzaku,® and NuSTAR.° Categorized by mirror fabrication technique, there are 
the telescopes made using the traditional grinding and polishing technique, also 
known as direct fabrication, such as Hinstein, ROSAT, and Chandra, and those 
made using replication processes such as EXOSAT, XMM, Swift/XRT, Suzaku, and 
NuS'TAR. Categorized by replication technique, EXOSAT’s and Suzakw’s mirrors 
were made with epoxy replication processes, whereas NuSTAR’s were made with a 
glass slumping process.'! There is yet another category of X-ray mirrors, not yet 
flown on any mission, which are made by cold-bending glass sheets!” 13 
wafers.!4 In general, astronomical X-ray mirrors made with the direct fabrication 


or silicon 


process have higher angular resolution but much smaller photon collecting area, 
and are much heavier and more expensive, whereas replicated optics have coarser 
angular resolution but much larger effective area, and are much less expensive. 

An ideal X-ray mirror assembly should have (1) high angular resolution, 
(2) large photon collecting area, (3) a large field of view (FOV), and (4) broad 
energy coverage. It goes without saying that it must also be affordable. Each of the 
four X-ray missions, Chandra, XMM-Newton, Suzaku, and NuSTAR, has its mirror 
assembly optimized for one or more of the above four metrics at the expense of the 
others. Chandra’s mirror assembly was optimized for on-axis angular resolution at 
the expense of photon collecting area and FOV. It was very expensive. XMM- Newton 
took the moderate approach, realizing a moderate angular resolution and a moder- 
ate photon collecting area at a moderate mass and production cost. Suzaku’s mirrors 
achieved a very large photon collecting area for its mass, volume, and cost at the 
expense of angular resolution. NuSTAR’s mirrors were optimized for collecting hard 
X-rays at expense of angular resolution. Both Suzaku’s and NuSTAR’s used the seg- 
mented implementation, each composed of thousands of lightweight and thin mirror 


segments. 

A successful X-ray mirror assembly for spaceflight is the culmination of a long 
process of technology development, engineering, production, and testing. Figure 1 
shows the five key technical elements that must be addressed to achieve such success. 
These elements connect to and interact with each other in complex and subtle ways. 


*See Chapter 1 “X-ray Telescopes Based on Wolter-I Optics” of this volume for more details. 
Y 19) 
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Fig. 1. The major technical elements involved in conceiving, designing, and implementing an 
X-ray mirror assembly. Many iterations and optimization of these elements and a construction 
and testing process culminate in a mirror assembly that meet both spaceflight environmental 
requirements as well as scientific performance requirements specified in terms of angular reso- 
lution, photon-collecting area, FOV, energy bandwidth, mass, volume, and production schedule 
and cost. 


The success of a mirror assembly depends on balancing and optimizing conflicting 
demands of these elements in different directions. 


2. The Meta-shell Approach 


Given its grazing-incidence nature, the basic geometry of an X-ray mirror assem- 
bly is nearly cylindrical. Therefore its most natural implementation is the full-shell 
approach adopted by Einstein, ROSAT, Chandra, XMM-Newton and Swift. It is gen- 
erally recognized, however, that the full-shell approach is inappropriate for making 
future large mirror assemblies, which typically have an outer diameter over 2 m, far 
beyond the 1.2 m of the Chandra mirror assembly. The segmented approach adopted 
for ASCA and Suzaku is, at least in principle, capable of making arbitrarily large 
mirror assemblies, as long as the entire assembly is appropriately segmented into 
relatively small “wedges”. This segmented approach, however, sacrifices the natural 
axial, or rotational, symmetry of an X-ray mirror assembly, making the wedges 
difficult to fabricate and integrate because of their lack of a well-defined symmetry 
or optical axis. 

Taking into account knowledge and lessons learned from past mirror assemblies, 
as well as our own effort of developing a technology for making future large and 
high-resolution mirror assemblies, we have come up with an approach that combines 
the strengths of both the full-shell and segmented approaches, yet avoids their 
weaknesses: meta-shell approach.'®:!® As shown in Fig. 2, a meta-shell comprises a 
structural shell and a large number of mirror segments, each of which is individually 
qualified to meet requirements and is attached at four locations. The structural shell 
is not an optical element, but a mechanical element providing an infrastructure to 
which each mirror segment is directly or indirectly attached with four spacers. The 
meta-shell approach has several important features: 
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2. An illustration of a hierarchical process of building an X-ray mirror assembly. Left: mir- 


ror segments are assembled into meta-shells, each of which contains hundreds to thousands of 
mirror segments. Right: these meta-shells, after having been individually tested and qualified, are 
assembled into the final mirror assembly. 


(1) 


Each meta-shell, once completed, is very much like a full shell, except that it has 
many times more photon-collecting area. This “amplification” of the effective 
area is an essential feature. 

Each meta-shell can be tested for performance, such as PSF, effective area, 
FOV, etc. and for environmental integrity, such as vibration, acoustic, thermal- 
vacuum, and shock. 

Each mirror segment, for all intents and purposes, is first kinematically mounted 
and aligned and then permanently bonded to minimize distortion. (See Sec. 7.3 
on alignment and Sec. 7.4 on bonding.) 

The meta-shell is structurally robust because of both the structural shell and 
the interlocking “brick-wall-like” bonding-together of mirror segments. As such, 
a meta-shell is a structurally stiff and optically precise, lightweight entity. 

The locations of the four spacers are optimized to minimize gravity-release 
error. Even though the mirror segments are bonded while they are distorted 
by gravity, the distortion disappears together with the gravity once the mirror 
assembly reaches space. 

Every component of a meta-shell, the structural shell, spacers, and mirror seg- 
ments, is made of silicon, which is highly thermally conductive. As such, the 
structural shell and the mirror segments are conductively coupled through the 
spacers. This conductive coupling makes it much easier to achieve thermal equi- 
librium among them, which is essential for maintaining good PSF. In particular, 
given the uniformity in material composition, the meta-shell can operate at a 
different bulk temperature, again greatly easing thermal control requirements 
(see Sec. 5). 

The design and construction process of a meta-shell is highly amenable to imple- 
menting stray light baffles between mirror shells (see Sec. 6). 
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3. Optical Design 


The grazing-incidence nature of X-ray optics dictates that a practical X-ray mirror 
assembly must comprise many concentric shells, each of which is aligned to concen- 
trate its X-rays onto the same spot on the focal plane. Each shell is practically a 
separate telescope, in that the optical path length is not preserved from one shell 
to another. In other words, photons from different shells add incoherently at the 
focus. Being practically a separate telescope, each shell has its own characteristics: 
on-axis PSF diffraction limit, off-axis PSF, off-axis effective area, all of which are 
also dependent on photon energy. The optical design process takes all of these 
factors into consideration to arrive at prescriptions of a set of shells that meet all 
requirements. 

The prescription of each shell, in general, can take one of three forms: Wolter-I,!” 
Wolter—-Schwarzschild,!® and some other variations on one of them to optimize one 
or more characteristics.’ In the Wolter-I design, each shell consists of a parabolic 
primary and a hyperbolic secondary. All of the past astronomical X-ray mirror 
assemblies have used this design or some approximation of it, depending on fab- 
rication precision. A shortcoming of the Wolter-I design is that it does not meet 
the Abbe sine condition for imaging.© The Wolter—Schwarzschild design meets the 
Abbe sine condition, resulting in better off-axis PSF in general. 

Input parameters to the optical design include focal length, inner diameter, i.e. 
the diameter of the smallest shell, outer diameter, i.e. the diameter of the largest 
shell, length of shells in the optical axis direction, thickness of mirror shells, and the 
unobstructed FOV that determines inter-shell spacing. While all of these parameters 
are, to one degree or another, influenced, if not dictated, by practical considerations, 
the length of shells perhaps requires the most consideration. This is especially the 
case when designing a high resolution and large FOV mirror assembly. Everything 
else being equal, a longer shell leads to a lower diffraction limit, but worse degra- 
dation of the off-axis PSF. In other words, a longer shell results in better on-axis 
PSF, but worse off-axis PSF. An optimal design must balance the on-axis PSF and 
FOV requirements to arrive at an optimal shell length. 

Given all of these parameters and desired output parameters, such as 
overall on-axis PSF and off-axis PSF, it is possible to vary the Wolter-I and 
Wolter—Schwarzschild prescriptions. Three prescriptions have been used: polynomial 
prescription,!? equal-curvature prescription,?? and modified Wolter-Schwarzschild 
prescription.?! Of these variations on the theme, the modified Wolter-Schwarzschild 
is the simplest. It takes the Wolter—Schwarzschild prescription and adds a constant 
second order to the axial figure of each shell such that it slightly modifies the focal 


>See Chapter 1 “X-ray Telescopes Based on Wolter-I Optics” of this volume for further details. 
°See Chapter 1 “X-ray Telescopes Based on Wolter-I Optics” of this volume for further details. 
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length and degrades the on-axis PSF to achieve the best possible off-axis PSF. The 
trade is between the on-axis PSF vs. FOV: the better the on-axis PSF, the smaller 
the FOV. 


4. Mechanical Design 


Mechanical design provides a structural framework to implement the optical design 
and facilitates the thermal design described in the next section. With the meta- 
shell approach, the mechanical design determines the thickness of the structural 
shell, dimensions of the mirror segment, i.e. how to segment the optical design’s 
mathematical shell in both the optical axis direction and the azimuthal direction, 
how many optical shells are to be attached to each structural shell, and the dimen- 
sions and locations of the spacers. As a result of these structural elements, the total 
effective area of the optical design may be reduced by 10% to 20%. 

The structural shell is made of the same material as the mirror segments and 
spacers, i.e. silicon. Its thickness should be as thin as possible, yet sufficiently stiff 
to allow the installation of mirror segments in a gravity environment with accept- 
able distortion. The dimensions of the segments should be as large as possible to 
minimize the number of them to be fabricated and installed, yet they must be small 
enough such that, when supported at four locations, their distortion due to gravity is 
sufficiently small to allow X-ray testing on the ground to verify their performance. 
In particular, the frozen-in figure error after gravity release must be sufficiently 
small to meet on-orbit performance requirement. This optimization process also 
naturally determines the locations of the spacers. The diameter of the spacers is 
determined by the strength imposed on the bonds that attach the mirror to its 
four spacers. In general, the larger the bond area, the stronger the bond. As such 
the diameter of the spacer should be as small as possible to minimize reduction of 
photon-collecting area, yet sufficiently large to meet bond strength requirements in 
order for the meta-shell to survive the rocket launch environment. 


5. Thermal Design 


Thermal design enables the mirror assembly to preserve its PSF in an inhospitable 
thermal environment in space. The mirror assembly is typically exposed to three 
objects with very different temperatures: the hot Sun on one side, cold space on 
another side, and the warm spacecraft and Earth on other sides. The typical thermal 
design has three components: (1) a heated thermal baffle that limits the exposure of 
the mirror assembly to cold space in the optical axis direction and radiatively replen- 
ishes unavoidable heat loss of the mirrors; (2) insulation of the mirror assembly as 
much as possible from the Sun and cold space in other directions; and (3) heaters 
installed at strategic locations to replenish heat lost and heat pipes that transfer 
heat from the Sun side to the cold space side. 
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With the meta-shell approach, because of the excellent thermal conductivity 
of silicon and because of the four spacers that connect each mirror segment to its 
neighboring segments and ultimately to the structural shell, thermal equilibrium can 
be facilitated and maintained with conduction. The traditional design of thermal- 
baffles and thermal blankets, combined with heaters attached to the structural shell 
can maintain the thermal equilibrium to enable the mirror assembly to achieve 
approximately 0.1 arcsecond HPD. 


6. Stray Light Baffling 


The X-ray mirror assembly can suffer from severe stray light. It must be properly 
baffled to stop the stray light to optimize its performance. Stray light, or X-rays, 
are those photons that reach the focal plane without being reflected once each on 
the primary mirror and the secondary mirror. They reach the focal plane either 
without reflection at all (straight-throughs) or with one reflection by either the 
primary mirror or the secondary mirror (single reflections). These ghost rays, as 
they are typically called, show up as background events on the focal plane. 

A perfectly baffled mirror assembly is one in which the focal plane detector only 
sees the secondary mirrors, and the secondary mirrors see only their corresponding 
primary mirrors, and the primary mirrors see only sources in the FOV proper of 
the telescope. 

In practice no mirror assembly can be perfectly baffled. Depending on the level 
of tolerable complexity, two types of baffles can be implemented. The first type, 
external to the mirror assembly,” is a vane-like baffle in front of the primary mir- 
rors. The second type, internal to the mirror assembly, consists of aperture stops 
installed between mirror shells.?? Depending on requirements and level of tolerance 
for complexity, one or more of these aperture stops can be installed. The meta-shell 
approach, by virtue of its design and construction process, is capable of implement- 
ing both types of baffles. 


7. Technology 


The design processes, including optical, mechanical, thermal, and stray light baf- 
fling, are fairly well understood and over the years have become more or less stan- 
dardized, although fine-tuning and optimization are always required. The practical 
implementation of the above designs requires four essential elements: (1) mirror 
substrates, (2) coating, (3) alignment, including both location and orientation, of 
mirror segments, and (4) permanent bonding. 


7.1. Fabrication of Mirror Substrates 


The mirror substrate must meet a number of requirements: figure quality, micro- 
roughness, thickness, and being lightweight. To meet these requirements we have 
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chosen mono-crystalline silicon as the material and polishing, i.e. direct fabrication 
as opposed to any replication process, as the basic production process.7+ 34 

Silicon is nearly an ideal material for making astronomical X-ray mirrors. With 
a density of 2.35 g/cm,° it is one of the lightest materials. Its high elastic modulus 
compared to glass, the most commonly used material for making precision optics, 
makes it much less susceptible to figure distortion by stray forces. In addition, 
its combination of high thermal conductivity and low thermal expansion makes it 
highly suitable for making optics for the inhospitable thermal environment typical 
of spaceflight. The most important property we utilize here is that, because of the 
semiconductor industry, large blocks of mono-crystalline silicon are available at 
affordable prices. These crystals are free of internal stress because every atom is on 
its lattice location. This property allows very thin substrates, less than 1 mm, to be 
polished and otherwise processed with predictable results, provided that all damage 
to the crystal structure resulting from the processing is properly removed. 

Direct fabrication, specifically precision polishing, has made the best optics, 
both normal incidence and grazing incidence, as demonstrated by all the large 
optical telescopes, such as the Hubble Space Telescope, and the Chandra X-ray 
telescope. The main drawback of the technique has been its inability to make very 
thin optics and its high cost per unit mirror area. 

The primary reason why the polishing process has not been able to make thin 
(<1 mm) mirrors is the internal stress of materials used as substrates. The removal of 
material by polishing is necessarily accompanied by the removal of stress associated 
with the material. As a result, the substrate’s figure changes for two reasons. The 
first one is the loss of material, which is predictable; and the second one by the loss 
of stress, which is unpredictable. When a substrate is thick, such that the material 
removal is totally negligible in comparison to its thickness, the loss of stress does 
not cause any change in figure. When the mirror substrate is thin, the removed 
material, typically on the order of several microns, can no longer be neglected. 

This problem is addressed by the use of mono-crystalline silicon as the substrate 
material, which is free of internal stress because each atom is at its lattice location 
and therefore its lowest energy configuration. Any polishing process, as long as 
it leaves no damage to the crystalline structure, will not generate nor relieve any 
stress, and therefore the resulting figure change is totally predictable. We have 
implemented this idea and developed a process, shown in Fig. 3, that produces 
mirror substrates from a block of mono-crystalline silicon. The process has several 
key steps. In the first step, a conical form is generated on a precision-machined 
conical tool, setting the average radius and the cone angle of the mirror substrate, 
the zeroth- and first-order approximation to any optical design, including Wolter-I, 
Wolter—Schwarzschild, or any other prescription. Then the silicon block is placed 
on a slicing saw to slice off the conical form, resulting in a lightweight shell. This 
shell suffers from severe damage on its surfaces and edges resulting from the slicing 
and lapping operations. In order to remove the damage to the crystal structure, the 
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Fig. 3. Illustration of the process of using single crystal silicon to make X-ray mirror substrates. 
The six steps are: material preparation (upper-left), grinding and lapping of the block to generate 
a conical surface (upper-middle), slicing to make a thin shell (upper-right), acid etching to remove 
damage to crystal structure (lower-left), polishing of the substrate surface (lower-middle), and 
trimming to remove roll-off errors (lower-right). 


thin substrate is immersed in an acid bath to etch away those atoms that have been 
disturbed from their lattice locations, resulting in a substrate that is once again a 
single crystal. The thin, damage-free substrate is polished to have the best possible 
figure, typically 3-5 arcseconds HPD (two reflections equivalent). It is then placed 
on another polisher to smooth out the polishing marks to achieve excellent micro- 
roughness, typically better than 0.2 nm RMS measured on an area of 0.45 mm x 
0.45 mm. 

After the above steps, the substrate is trimmed down from its size of approxi- 
mately 150 mm x 150 mm to the precise dimensions required of a mirror substrate, 
typically 100 mm x 100 mm. As part of the trimming operation, the resulting edges 
of the substrate are polished to a glossy finish to remove all micro-fractures, which 
can cause figure distortion and are prone to propagation leading to breakage. 

Lastly, the mirror substrate is measured on an interferometer to generate a 
topographical map to be used by an ion-beam figuring machine to fine-tune the 
figure. Ion-beam figuring is a mature technology that can remove material with 
nanometer precision. Typically in one pass, with an appropriately sized beam, it 
can improve the RMS height error from tens of nanometers to better than 5 nm 
and the RMS slope error from about 1 arcsecond to better than 0.2 arcseconds. 
Further iterations and improvement depend on the accuracy of the topographical 
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map and on the precision of cross-registration between the map and the ion-beam. 
It is expected that two or three iterations of measurement and figuring®! with an 
ion-beam will lead to mirrors of better than 0.1 arcsecond quality, enabling next- 
generation ambitious X-ray astronomical missions. 


7.2. Coating 


A bare silicon surface’s reflectance of X-rays is unacceptably low. A thin film, on the 
order of 20 nm, of gold, platinum, or iridium can significantly enhance its reflectance. 
A thin film coating, however, has a significant downside: its stress can severely 
distort the figure and degrade its image quality.3?:3> While the X-ray reflectance 
enhancement is progressively larger going from gold to platinum to iridium, the film 
stress and thereby figure distortion are also progressively more severe.°” 

Two potential solutions have been studied to address thick film stress: stress 
cancellation®? and thermal annealing.** It is intuitively obvious that, if both the 
front (concave) and back (convex) sides of the substrate are simultaneously coated 
with a film of the same material and same thickness, the resulting mirror segment 
ought not to be distorted at all because the stress should cancel between front and 
back. In practice, the conditions of “simultaneity” and “same thickness” are difficult 
to realize, and if realized, difficult to verify and confirm. Two coating methods have 
been used: magnetron sputter (MS) and atomic layer deposition (ALD). They each 
have their own advantages and disadvantages. MS is a mature industrial process 
and thus inexpensive and easy to perform. It can achieve the required film thickness 
in a matter of seconds. But as a “line of sight” coating process, it is difficult to 
control thickness over a large area and to achieve simultaneous coating of both 
sides. An added complication is that it heats the substrate; the heating is almost 
certainly not uniform, resulting in thermal distortion that can be frozen in by the 
thin film. ALD, on the other hand, being a diffusion process, can coat both sides 
simultaneously, provided that the nucleation process can be initiated simultaneously 
everywhere on the surfaces. The simultaneous initiation, however, is difficult to 
achieve and verify because it depends on surface physical and chemical conditions. 
Any local contamination on the surface can accelerate or retard the nucleation 
process. Experimentation with two-sided coating has shown that it indeed has, on 
average, less distortion than single-sided coating, but its net distortion is far from 
zero and not repeatable from run to run under nominally the same conditions. Much 
work remains to be done in this area. It is likely that carefully controlled and baffled 
MS guns are necessary to achieve the required conditions for stress to cancel to an 
acceptable level. 

Another way of achieving stress cancellation or balance is using thermal oxide 
grown on the convex side and iridium coating on the concave side. Both thermally 
grown silicon oxide and magnetron-sputtered iridium have compressive stress, and 
therefore they can in principle cancel out each other. This method has the advantage 
that the process of growing silicon oxide on a silicon wafer is a well-understood 
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industrial process, with a thickness and uniformity precision of better than 1 nm. 
In addition, coating a concave surface using either the MS or LAD process is far 
easier than coating a convex surface. 

Meanwhile, experimentation has shown that thermal annealing significantly 
reduces thin film stress and distortion.?+ The prospects of achieving distortion-free 
coating probably lies in a combination of stress cancellation of double-sided coating 
and thermal annealing. At the present time, coating distortion has been reduced to 
a level acceptable for a 5-arcsecond mirror assembly using iridium. 


7.3. Alignment 


Once a mirror segment is fabricated and verified to meet all requirements, including 
figure, micro-roughness, dimensions, and mass, it must then be aligned and bonded 
to a meta-shell. The alignment process locates and orientates the mirror segment 
into its design or optimal location and orientation. It must do so without degrading 
the performance of the mirror segment. The best way to meet those requirements 
is to kinematically support the mirror segment. While kinematic mount of a flat 
mirror means three supports, the kinematic mount of a curved mirror, such as an 
X-ray mirror, means four supports, as shown in Fig. 4. The four supports, assisted 
by gravity, i.e. the mirror segment’s weight and friction, are necessary and suffi- 
cient in determining its location and orientation. The alignment process becomes 
an adjustment of the heights of those four supports. 

These supports, typically implemented as round cylindrical posts made of mono- 
crystalline silicon, the same material as the mirror segments, with a diameter of 
3-5 mm. Their lengths or heights can be reduced precisely by a lapping and buffing 
process, similar to the precision polishing process of making a mirror. The location 
and orientation of the mirror segment is precisely determined by Hartmann mea- 
surement using beams of light reflected by the mirror segment at several different 
meridians.*° Thus the alignment process is an iteration of Hartmann measurement 
and lapping of the posts, also known as spacers for obvious reasons. Since the lapping 
process can remove material with nanometer precision, this process can align the 
mirror segment with sub-arcsec precision. 


Flat Mirror Curved Mirror 


t 


Fig. 4. Illustration of kinematical support of a flat mirror (left) and a curved mirror (right). In 
a gravity environment, where the weight of the mirror is used as the nesting force, the orientation 
and location of a flat mirror is necessarily and sufficiently determined by three supports. For a 
curved mirror, such as an X-ray mirror, four supports are required. 
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Fig. 5. Illustration of permanently bonding the mirror segment to four posts, or spacers. The 
posts are typically cylindrical in shape with a diameter of 3 mm. The top surface of each post is 
lapped into a spherical shape such that the direct contact area between the mirror surface and the 
posts is relatively small, approximating a point contact, leaving the gap that is filled with epoxy 
with a variable thickness of several to tens of microns. 


It should be mentioned that, because of friction between the mirror segment 
surface and the posts, the mirror segment, when placed on these posts, needs help 
to settle into its natural configuration. This help is provided in the form of acoustic 
vibrations. A loud speaker with an appropriate pitch is sufficient to accomplish this 
task.°6 


7.4. Bonding 


Once a mirror segment is aligned, it must be permanently bonded. This is accom- 
plished by first removing the mirror segment from the posts, then applying a trace 
amount of epoxy to the top of each post, and finally replacing the mirror on the posts 
and applying the acoustic vibrations to settle the mirror segment into alignment.*6 
After the epoxy has cured, the mirror segment is fully and permanently bonded. 
This process is illustrated in Fig. 5. 

This way of bonding the mirror has several highly desirable features. First, the 
epoxy is used only as an adhesive. Because of the fluidity of the epoxy, weight of the 
mirror segment, and the acoustic vibrations, the final epoxy layer between the mirror 
surface and the top surface of the post is extremely thin. Although not precisely 
measured, the layer is probably no more than a few nanometers in thickness where 
the mirror surface was in full contact with the posts before the application of epoxy. 
Second, the epoxy thickness variation from one post to another is probably even 
smaller. This is essential for the preservation of the alignment during and after 
the cure. Third, since the epoxy covers an area of only about 3 mm in diameter, 
i.e. the diameter of the post, over which the mirror segment, being about 1 mm 
thick, is extremely stiff, mirror figure distortion due to epoxy stress is small. For 
arcsecond mirror assemblies, possibly sub-arcsecond ones, this distortion may be 
either negligible or manageable. 


8. Development Status and Prospects 


The process from the conception of a set of technical ideas, such as those described 
above, to the commission of an X-ray mirror assembly, is a long and arduous one, 
involving years of technology development, engineering, construction, and testing, 
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Fig. 6. A technology development module with one pair of mirror segments aligned and bonded 
the way that has been described in this chapter (left). An X-ray image obtained by this module 
when fully illuminated with a beam of 4.5 keV X-rays (middle). The fraction of encircled energy as 
a function of image diameter, showing that this image has a half-power diameter of 2.2 arcseconds. 


not to mention the many millions of dollars needed to finance the entire endeavor. 
Nonetheless, the validation of the technical ideas is the very first step of this process. 
As of December 2017, all the basic elements of the approach described above have 
been developed and empirically verified, and therefore validated. The single most 
important evidence is shown in Fig. 6, which shows a pair of mirror segments that 
were fabricated aligned and bonded and finally placed in a beam of 4.5 keV X-rays, 
achieving an X-ray image of better than 3 arcseconds HPD. As we continue the 
development of this process, we expect that segmented lightweight X-ray optics will 
reach 1 arcsecond image quality by 2020 and 0.1 arcsecond by 2030. 

Table 1 summarizes the characteristics of each of the many technical elements 
described in previous sections and how they impact the final characteristics of a 
mirror assembly. A successful mirror assembly is the result of detailed trade-offs and 
optimization of these elements under a specific set of science, technical, schedule, 
and cost environment. 

The technology development described in this chapter was initiated in the early 
2000s for the Constellation-X project?’ and later for the International X-ray Obser- 
vatory project. Over the years it has been funded by NASA through the projects’ 
offices as well as the Astrophysics Research and Analysis (APRA) and the Strategic 
Astrophysics Technology (SAT) programs. Much progress has been made, as shown 
in Table 1, but much remains to be done to realize X-ray astronomers’ dream of 
a telescope with sub-arcsecond X-ray optics and a photon collecting area of many 
square meters. 

A statistical look at the data in Table 2 leads to a Moore’s law of X-ray optics: 
angular resolution improves by a factor of 2 approximately every five years. This 
appears to be true for heavy as well as lightweight optics. This law would pre- 
dict that a sub-arcsecond lightweight X-ray telescope could be built and flown by 
the late 2020s.2° While the construction, launch, and operations of a major X-ray 


Table 1. Relationships between desired characteristics (topmost row) of a mirror assembly and the technical elements (leftmost two columns) required 
to build it. These relationships are multi-dimensional. A successful mirror assembly represents a powerful compromise among these variables that can 
be implemented in a specific scientific, technological, spaceflight opportunity, and budgetary context. 
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Table 2. Comparison of a number of X-ray telescopes and mirror technologies. For heavy optics, 
the angular resolution improved by a factor of 16 in 22 years, amounting to a factor of 2 improvement 
approximately every 5 years. Likewise, for lightweight optics, the angular resolution improved by a 
factor of 90 in 25 years, amounting to a factor of 2 improvement every 4 years. This could be called 
the Moore’s law of X-ray optics. A simple extrapolation of the data in this table and an application 
of the new version of Moore’s law would lead to the conclusion that a sub-arcsecond lightweight 
X-ray telescope could be constructed by sometime in the 2020s. 


Angular 
Resolution Mirror 
Project (arcsecond Mirror Material and Areal Density 
Program Year HPD) Fabrication Process (kg/m?) Reference 
Heawy Einstein 1978 8 Ground and polished ~33 Ref. 2 
Cpacs ROSAT 1990 4 glass or glass-ceramic ~50 Ref. 4 
Chandra 1999 0.5 shells ~50 Ref. 5 
BBXRT 1990 ~180 Lacquer-coated Al ~0.5 Ref. 13 
ASCA 1993 180 foils ~0.5 Ref. 7 
Lightweight Suzaku 2005 120 Epoxy-replicated Al ~0.5 Ref. 8 
Optics foils 
NuSTAR 2012 58 ~0.5 Ref. 9 
2012 17 Thermally slumped ~1 Ref. 17 
NGXO 2013 11 glass segments ~1 Ref. 18 
Technology 2014 8 ~1 Ref. 19 
Development 2017 4 Ground, polished, ~1 Ref. 39 
Modules 2018 2 and light-weighted ~1 Fig. 6 
~2020 ~1 mono-crystalline ~1 Expected 


silicon segments 


observatory is a significant undertaking, requiring the confluence of scientific prior- 
ity, technology, engineering, economics, and politics, we believe that a robust mirror 
technology lies at the foundation of such an undertaking. The technical approach 


outlined in this chapter, if carried out successfully, will provide such a foundation.*® 


Acknowledgments 


The author acknowledges the contributions of the many scientists and engineers at 
NASA Goddard Space Flight Center and Marshall Space Flight Center to the con- 
ception, development, and validation of the X-ray optics based on mono-crystalline 
silicon, especially those of Kim D. Allgood, Michael P. Biskach, Kai-Wing Chan, 
Michal Hlinka, John D. Kearney, James R. Mazzarella, Ryan S. McClelland, Ai 
Numata, Stephen L. O’Dell, Lawrence G. Olsen, Raul E. Riveros, Timo T. Saha, 
and Peter M. Solly. He gratefully acknowledges the financial support of the National 
Aeronautics and Space Administration, which has made this work possible. 


References 


1. R. Giacconi and B. Rossi, J. Geophys. Res. 65, 773 (1960). 
2. L. P. Van Speybroeck, Proc. SPIE 106, 136 (1977). 


37. 


ww 
oo 


MCs saat c wa Ra 


W. W. Zhang 


J. De Kort et al., Space Sci. Rev. 30 495 (1981). 

schenbach, Appl. Opt. 27, 1401 (1988). 

. P. Van Speybroeck, Appl. Opt. 27, 1399 (1988). 

. Gondoin et al., Proc. SPIE 2209, 438 (1994). 

. Burrows a al., Proc. SPIE 4140, 64 (2000). 

erlemitsos et a Publ. Astron. Soc. Japan 47, 105 (1995). 

erlemitsos et al., Publ. Astron. Soc. Japan 59, 9 (2007). 

. Craig et al., Pron: SPIE 8147, 81470H (2011). 

. Zhang, Pion: SPIE 7437, 74370N (2009). 

. Underwood and D. Turner, Proc. SPIE, 106, 125 (1977). 

. Fabricant, L. M. Cohen and P. Gorenstein, Apal. Opt. 27, 1457 (1988). 

. Collon et al., Proc. SPIE 8861, 88610M (2013). 
Zhang et al., Proc. SPIE 9905, 990518 (2016). 
McClelland et al., Proc. SPIE 9905, 99057A (2016). 

olter, Ann. Phys. 10, 94 (1952). 

olter, Ann. Phys. 10, 286 (1952). 

. Burrows, R. Burg, and R. Giacconi, Astrophys. J. 392, 760 (1992). 
. Saha and W. Zhang, Appl. Opt. 42, 4599 (2003). 

. Saha et al., Proc. SPIE 9144, 914415 (2014). 

. Mangus, Proc. SPIE 830, 245 (1988). 

Moran and J. E. Harvey, Appl. Opt. 2'7, 1486 (1988). 


. A. 
JA 


an 1 oy 
wie 


oe 


eee 


. W. W. Zhang et al., Next generation X-ray optics: high-resolution, light-weight, and 
low-cost, A paper submitted to NASA in response to solicitation NNH11ZDA018L 


(2011). 


. W. W. Zhang et al., Proc. SPIE 8443, 84430S (2012). 
. W. W. Zhang et al., Proc. SPIE 8861, 88610N (2013). 
. W. W. Zhang et al., Proc. SPIE 9144, 914415 (2014). 
. R. E. Riveros et al., Proc. SPIE 9144, 914445 (2014). 


. E. Riveros et al., Proc. SPIE 9603, 96030W (2015). 

. E. Riveros et al., Proc. SPIE 9905, 990521 (2016). 

. E. Riveros et al., Proc. SPIE 10399, 103990T (2017). 
. W. Zhang et al., Proc. SPIE 7011, 701103 (2008). 

-W. Chan et al., ee SPIE 8443, 8443358 (2012). 

pa - an et al., Proc. SPIE 9144, 914440 (2014). 

, Proc. SPIE 8147, 814717 (2011). 

W. Ch an al al., Proc. SPIE 10399, 103990U (2017). 

. E. White end EL D. Tananbaum, Proc. SPIE 4851, 293 (2003). 
. W. Zhang et al., Proc. SPIE 10399, 103990S (2017). 


SAR SR ASR ED: 
= 
—_ 
© 
BSE 


Chapter 4 


Adjustable X-ray Optics 
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Scientific progress in understanding the universe requires data across the elec- 
tromagnetic spectrum. To address the cutting edge scientific problems of the 
coming decades, we need X-ray data with sub-arcsecond resolution. To use such 
precise angular resolution effectively and efficiently, large collecting areas are nec- 
essary. These requirements dictate that future X-ray telescopes are made from 
highly nested, thin shells. To control the figure and alignment of such flimsy 
optics, we believe on-orbit adjustability is required. Accordingly, the Smithso- 
nian Astrophysical Observatory and The Pennsylvania State University have 
embarked on a project to control the mirror figure by depositing piezoelectric 
material on the back (non-reflecting) side of 200 mm long pieces of 0.4mm thick 
glass. Detailed optical metrology on the ground determines the voltages used 
on-orbit to impart the stresses needed to produce the required X-ray reflecting 
shape. We have experimentally verified this concept (i.e. established Technology 
Readiness Level 3), and expect to develop it as a candidate technology for the 
Lynz mission to be presented to the 2020 Decadal Committee for Astronomy and 
Astrophysics. 

This chapter describes the process of implementing the piezoelectric adjus- 
tors, and the concept of a large area, high angular resolution X-ray telescope for 
an observatory to be used by astronomers worldwide. 


63 


64 D. A. Schwartz et al. 


1. Introduction 


The Chandra X-ray observatory! *+ has set the standard for imaging and spectro- 
scopic resolution for an X-ray astronomy telescope.* However, its collecting area 
is not dramatically larger than the first X-ray telescope, the 5 arcsec resolution 
Einstein observatory,’ or even than the very first X-ray astronomy satellite, the 
non-imaging Uhuru.® The optics technology for Chandra was based on thick, heavy, 
Zerodur glass shells, which were figured and polished to give the required surface 
shapes and smoothness. Because X-rays reflect only at shallow grazing angles a, 
typically 1/2 to 1 degree, the ratio of mirror surface area to effective collecting 
area is very large, of order 2/sina * 200 (Fig. 1). To build up a large area for 
collecting X-rays, a nested series of shells is required. Within a factor of 2, the 
four-shell Chandra mirror design used the maximum practical mass and size budget 
for available launch vehicles. Therefore, to increase collecting area by a factor of 30 
and retain sub-arcsecond angular resolution, as recommended by NASA’s visionary 
7 new technology is needed. 

In particular, the ratio of mirror mass to mirror area must be drastically 
reduced, and orders of magnitude more mirror shells must be accommodated 
within the fixed launch vehicle volume. Both requirements point to a solution using 
extremely thin reflecting shells. One would need large numbers of relatively small 
reflecting pieces, arranged in circular segments of hundreds of separate shells. Such 
thin and flimsy mirrors cannot be ground and polished, and would be expected 
to distort due to stresses of mounting, launch loads, gravity release, and thermal 
environment. Thus the impetus of our development is to create adjustable optics. 
The adjustment would be made after mounting, and re-established on-orbit, to 
correct the figure to counteract intrinsic and induced distortions. Besides meeting 


road-map committee for astrophysics, 
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Fig. 1. Cross-section of half of the grazing-incidence configuration of the four-shell Chandra X-ray 
mirror. Two successive reflections bring on-axis rays to a focus (off the right of the page), following 
the path of the thin solid lines. The full mirror is a figure of revolution about the optical axis, 
(horizontal line, not shown). The paraboloid and hyperboloid surfaces must be precisely figured 
along their entire lengths, areas more than 100 times the effective aperture area for X-rays. That 
collecting area is only the four small rectangles shown to the left of the paraboloid surfaces, namely 
the length of the paraboloid projected perpendicular to the incoming beam. 


@ Angular resolution of 0//5 and spectroscopic resolution of R = A/6\ up to 1000. 
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the requirements of the Lynz mission, adjustable X-ray optics offers the best path 
to develop future generations of X-ray observatories with spatial resolution better 
than 0/1, using specialized equipment to calibrate each shell with on-orbit X-ray 
measurements.” !° 

Because of the emphasis on achieving the best possible angular resolution, 
we will only consider Wolter-type mirror figures.!!:!?:> This by-passes many novel 
designs motivated by all-sky monitoring (e.g., lobster eye optics®) and large field 
of view surveys.!? !° Optics that are adjustable on-orbit might prove ideal for a 
mission that would combine both targeted observations of individual objects with 
high angular and energy resolution and also wide-field surveys via reconfiguring the 
shape to optimize the average angular resolution over the total field of view. 

Section 2 defines the concept to implement on-orbit mirror adjustment utiliz- 
ing bi-morph piezoelectric technology. We also briefly review several other possi- 
ble techniques to adjust the optic figure on the ground. Those other technologies 
require the subsequent figure, mounting, and alignment to be maintained through- 
out launch and on-orbit operation, and are not adjustable on-orbit. Section 3 details 
the relevant physics and the development of piezoelectric actuators on glass mirrors. 
Section 4 discusses progress to date in correcting mirror figure, which established 
Technology Readiness Level!® (TRL) 3, and describes the path to TRL 4. Section 5 
introduces the Lyna mission, an X-ray astronomy observatory concept enabled by 
adjustable X-ray mirror technology. 


2. Adjustable Optics Concepts 


2.1. Ground-based Adjustable Optics 


The idea to develop adjustable X-ray mirrors follows development of analogous 
applications for visible light telescopes and for X-ray beams in synchrotron facilities. 
Active optics, also called adaptive optics, are routinely used with ground-based 
optical telescopes to enhance imaging. The primary motivation of this technology 
was to correct for atmospheric scintillations. Even with ideal weather conditions 
and at high altitudes, scintillations limit ground-based telescopes to the order of 
1 arcsecond resolution, far worse than the diffraction limit of telescopes at optical 
wavelengths. Volumes 2 and 3 in this series discuss deformable mirrors“ and active 
optics® in the context of visible light astronomy. It is important here to emphasize 
the difference between active optics and adjustable X-ray optics. 

Atmospheric scintillations occur randomly and continuously, so that ground- 
based astronomy must sense the distortions and correct the telescope mirrors at 


>See Volume 4, Chapter 1. 
°See Volume 4, Chapter 5. 
4See Volume 2, Chapter 8. 
°See Volume 2, Part 9, “Adaptive Optics (AO)”. 


66 D. A. Schwartz et al. 


frequencies of 100’s of Hz, utilizing wavefront measurements! at frequencies up to 
kHz. This is accomplished by splitting the light from either a bright star which 
happens to be in the field of view, or a star-like light source generated by a laser 
exciting sodium atoms in the far upper atmosphere. Dedicated optics and detectors 
are used to measure the wavefront distortions from the point source. Fast computer 
algorithms calculate and apply the necessary signals to a correction mirror. This 
mirror is a thin reflector, much smaller than the telescope primary, and is typically 
deformed by actuators applying forces perpendicular to its surface. In the X-ray 
case, we are looking to correct perturbations which change very slowly, if at all. 
Therefore we call our technology adjustable rather than active or adaptive. X-ray 
telescopes can only operate in space, so there are no high frequency perturbations. 
On-orbit, if thermal effects can be controlled to the necessary precision, and if 
there are no degradations of materials or adhesives due to aging or cosmic radiation 
effects, it is possible that no further adjustment would be needed. In any case, the 
X-rays are detected as single photons, with the weakest sources registering only ten 
counts in a month, and there are no practical means for X-ray beam splitting or 
wavefront sensing on-orbit. 

At X-ray synchrotron beam lines, many applications benefit from being able to 
focus a high concentration of X-rays on a target. Beam lines are general user facil- 
ities, and must accommodate a variety of different experimental geometries and of 
requirements for the X-ray beam shape. Focal lengths must be re-adjusted, as the 
source of X-rays are at a finite distance, in contrast to astronomical applications 
where sources are effectively at infinity. Originally, complex and bulky mechanical 
benders were used to shape the X-ray mirrors. In the 1990s, mirrors were devel- 
oped integrated with piezoelectric actuators in a bi-morph configuration to provide 
precise and variable control of the X-ray reflecting surface.1” +18 While mechanical 
adjusters are still used,!% 7° 
are a mature technology that is widely used at synchrotron beam lines. 

Synchrotron applications are sufficiently different that the technology cannot 
be transffered directly to X-ray astronomy. Synchrotrons generally provide such a 
large intensity of X-rays that a single mirror suffices to collect an abundance of 
photons. The synchrotron mirrors are relatively massive and must endure a large 


mirrors adjusted with in-plane piezoelectric actuators 
21,22 


heat load from the beam. Mirrors are typically optimized to focus in one dimension 
only, with the crossed Kirkpatrick—Baez configuration?’ used for two-dimensional 
focusing. Of course the biggest difference is that the ground-based applications can 
always gain access to adjust, repair, or modify the controls as needed. 


2.2. X-ray Astronomy Optics 


Because an X-ray astronomy mirror must be constructed of large numbers of con- 
centric shells, only extremely minimal structure is allowed between the cells in order 


fSee Volume 2, Part 7, “Wavefront Sensing Techniques”. 
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to avoid blocking X-rays. The adjustment concept is therefore to generate different 
in-plane stresses over the back (i.e. non-reflecting) surface of each mirror element, so 
as to impart an appropriate local strain to correct low-order figure distortions. This 
depends on the original mirror surface being sufficiently smooth on all spatial scales 
smaller than about 5mm, and on surface quality on those scales being maintained 
during all subsequent processing. 

The primary distortion is due to the imperfections in the figure of the mir- 
ror element as it is initially manufactured. Technologies for producing thin mirror 
shells or segments include full shell replication,?4?"'* thermal forming,?* °° slic- 
ing of pure Si crystals,*!:> and air bearing slumping.*?-*4 Figure 2 is a cartoon 
illustrating possible methods of making and correcting individual reflecting pieces, 
and assembling them into a mirror. 

We can define two categories of figure adjustment processes. In one case the 
correction to the figure is effected by depositing material where the reflecting sur- 
face is lower than the desired figure?® 27 35-36 ( 
than needed for the ideal surface), and/or by ion milling?” away the high places, 
where the mirror radius is too small. In the MSFC implementation of differential 
deposition, the mirror segment is scanned in one dimension over a magnetron sput- 
ter target. The target output is restricted spatially by a slit. The velocity of the 
mirror motion is controlled so that it dwells in view of the target for longer times 
where more material must be deposited. While issues of surface stresses, mounting 
distortions, and thermal distortions are still being worked,** optical metrology and 
X-ray tests have both indicated figure slope improvements of better than a factor 
of 2.27 In a configuration developed by Windt, the mirror element is successively 
passed over a magnetron target to deposit into “low” regions, and then over an ion 
source to erode away “high” spots.°” 

The other adjustment category changes the global shape of the manufactured 
element via in-plane stresses. Concepts for using in-plane stress to adjust mirrors 


i.e. where the mirror radius is larger 


may be further divided into two groups: imparting permanent stresses as part of 
the manufacturing process, or implementing a means to apply variable stresses. The 
importance of the latter category is that it can be used to make adjustments after 
mounting and aligning the individual segments or shells, and can be used on orbit 
to make adjustments, e.g. based on changing thermal or aging conditions. These 
are the motivations for the use of piezoelectric adjustors to affect in-plane stresses. 

Permanent, localized stresses can be built into thin mirrors to compensate for 
local slope errors. Jon implantation can generate stress in the plane of a thin glass 
sheet by making a localized change in the glass density. By inclining the ion beam 
at an angle to the glass plane, and scanning the beam in two dimensions, a localized 


See Volume 4, Chapter 6. 
hSee Volume 4, Chapter 3. 
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Fig. 2. Cartoon indicating possible technologies to produce very light weight, high quality mirrors. 
Any process in the top row might be used to make the initial reflecting element. In the middle 
row are concepts for adjusting the original as-made elements to improve the surface figure. None, 
or multiple of these may be used depending on the initial figure quality. The bottom row shows 
that either full shells or modules containing individual segment pairs, or both, are then integrated 
to make a complete mirror. Not shown is the coating of the mirror surface with a high atomic 
number element, such as Ir, to enhance the X-ray reflectivity at higher energies. (Abbreviations: 
MSFC = Marshall Space Flight Center; GSFC = Goddard Space Flight Center; ESA = European 
Space Agency; MIT = Massachusetts Institute of Technology; Con-X, IXO and AXSIO were 
proposed X-ray Observatories; XRO = X-ray Optics company; SAO = Smithsonian Astrophysical 
Observatory; PSU = The Pennsylvania State University; NU = Northwestern University. See 
electronic edition for a color version of this figure.) 


pattern of stress can be imparted.°? The inclined beam allows the stress to be an- 
isotropic, so that the proper two-dimensional stress pattern can be set up to correct 
arbitrary mirror slope errors in both the axial and azimuthal directions. Magnetic 
stresses can be generated by depositing a layer of magnetostrictive material over the 
reverse side of the mirror, and using an electromagnetic write head that variably 
imparts permanent magnetization to a magnetically hard material.4°-4? Distortion 
due to stress generated by the high atomic number material coating the reflective 
surface must always be considered. It is being investigated whether this effect can 
be used to differentially deposit material on the back side of a mirror to control the 
figure distortions.*? 
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Fig. 3. Cartoon of the operation of a piezo cell to perform figure correction. Panel a shows a 
cross-section through a 20 mm long section of a mirror, illustrating the structure of one piezo cell. 
The dimensions shown (not to scale) are notional possibilities. Applying an increased DC voltage 
to a particular cell, from panel b to panel c, causes the PZT material to contract, imparting a 
stress on the mirror substrate that can change its radius locally. (See electronic edition for a color 
version of this figure.) 


The concept of using piezoelectric actuators to generate in-plane strains of a 
mirror surface is illustrated in Fig. 3. In panel a the mirror substrate is coated with 
a high atomic number metal such as iridium to enhance the X-ray reflectivity up 
to 10 keV. On the back, a continuous conductive layer, e.g. of Pt, is deposited to 
serve as a ground plane. Then a continuous layer of the piezoelectric material lead 
zirconate titanate is deposited, as discussed in Section 3. Finally a patterned array 
of isolated conductors is deposited in a cell structure, along with traces (not shown) 
between the cells leading to the mirror edges. This allows each separate cell to be 
activated by an appropriate controlled voltage applied to the outer electrode. Panels 
b and c show an example of applying a voltage to one cell, imparting a local radial 
change of 0.5 micron. An insulating layer and additional electronics (not shown) 
can be deposited on each cell (cf. Section 3). 

Fundamental to any of the adjustment processes are means of optical metrology 
to measure what adjustments are required on various scales, and to verify that 
the corrected element meets its specifications. In particular, for our piezoelectric 
adjustment we must first use optical metrology to measure the influence functions. 
These functions are the surface displacement F;;(V) where V is the voltage applied 
to cell i, and F is the radial displacement generated at position 7. In principle, 7 is 
a two-dimensional continuous position on the glass, but in practice it is the discrete 
node in a finite element analysis (FEA) or in an optical measurement of the surface 
displacement. Using finite element analysis*+ to calculate the influence functions, 
we verified that the attainable stress from the piezo elements would be sufficient to 
correct the figure of a mirror that could actually be produced (Fig. 4). 

Measurement of influence functions using an optical profilometer*®:“° verified 
the fidelity of the FEA calculations to the expected order of 10%. We have since 
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Fig. 4. Simulation of the correction of a mirror using in-plane piezo actuators. Left panel shows 
five axial traces of deviations from the desired figure of an actual mounted mirror segment made 
as part of the development program for the proposed International X-ray Observatory.*9 The 
equivalent angular resolution of > 10” is corrected to 0/'4 in the right-hand panel. (From Ref. 50. 
See electronic edition for a color version of this figure.) 


improved our influence function measurement system by implementing a Shack— 
Hartmann wavefront sensor,*” which allows a much faster map of the entire mirror 
surface. For measuring our curved mirrors, we use a computer generated hologram 
as a cylindrical null lens to match the radius of curvature of the mirror element.*® 


3. Piezoelectric Adjustors 


A substantial effort has gone into trying to find ways to build closely nested shells 
of thin mirror segments, using either polished silicon, full metal mirror shells or 
segmented mirrors made of either metal foils or slumped glass. While the quality 
available from these technologies has improved over the years, the best-reported 
resolution is in the range of 5-10 arcseconds for partial mirrors.?? Because it is 
difficult to achieve and maintain perfect optical figures through mirror fabrication, 
assembly, and gravity release, there are great advantages to be able to correct the 
optical figure. Electrostrictive actuator arrays are adopted for both ground and 
space-based optical telescopes, but the relatively large thickness of devices based on 
bulk ceramics is problematic for densely nested mirror segments of interest for next- 
generation X-ray space telescopes with large collecting areas. It is also challenging 
to mate and bond bulk actuators prepared by conventional sintering processes to 
curved mirror segments. 

To alleviate these difficulties, thin film piezoelectric or electrostrictive materials 
can be deposited directly onto the convex side of thin glass mirror segments, as 
shown in Fig. 3. Fundamentally, piezoelectricity describes transduction between 
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electrical and mechanical energies, such that an applied stress or strain produces a 
polarization, or conversely, an applied electric field induces a change in shape.' All 
piezoelectric materials have anisotropic properties; thus, the strain induced parallel 
to an applied electric field differs from that in the orthogonal directions. Most 
strongly piezoelectric materials are ferroelectric, and the convention in describing 
the piezoelectric response is to assign the index 3 to the polarization axis. For thin 
layer actuators with electrodes on the top and bottom surfaces, this means that the 
8 axis is out of plane. For the geometry shown in Fig. 3, this thin layer is bonded to 
a passive elastic layer. If the passive layer undergoes a biaxial strain x; = 2 in the 
1—2 plane, it induces a polarization P3 on the electrodes in the 3 direction. More 
compactly, P; = e31,¢(@1 + #2). The piezoelectric coefficient e31,¢ describes the in- 
plane stress induced by an electric field in the 3 direction.®! Thus, for low-voltage 
adaptive optics, large e31,¢ are desired. 

To control the shape of a mirror, an array of top electrodes, rather than a 
uniform top electrode, can be patterned. Applying a voltage across the thickness 
of a piezoelectric cell produces a local in-plane stress that can be used to bend 
the glass. Once the influence function for each cell is known, arrays of electrodes 
actuated at the appropriate voltage can then be used to locally deform the mirror 
in situ in a predictable and reproducible way to allow correction of figure errors 
while the telescope is in orbit. 

This concept necessitates a piezoelectric layer with adequate actuation author- 
ity to bend the entire glass substrate at modest applied voltages, good lifetimes for 
the actuator elements under DC electric field, as well as individual voltage control of 
numerous electrodes in the array. PbZro.52Tio.4g03 (PZT) films (sometimes doped 
with Nb to increase either the piezoelectric coefficient, or the lifetime) are good 
candidates for this application. High |e3;,;| > 7 C m~? have been demonstrated 
for a variety of deposition techniques.°!>? This allows typical errors in thin mirror 
segments to be corrected by application of < 10 V. 

Simulations of the effect of an array of 5 mm x 5mm PZT actuators on a 
400 micron thick Corning Eagle glass mirror segment is shown in Fig. 5. The initial 
mirror shape was presumed to be deformed. Measured data for the influence function 
of individual PZT cells were then used to simulate the improvement in optical figure 
that should be possible by setting optimal voltages to each cell. It is clear that error 
patterns can, in principle, be corrected using this technique. The best figures are 
achieved when the spacing between piezoelectric cells is minimized. The models 
suggest that 05 resolution is achievable.°? 

From a processing standpoint, use of piezoelectrics such as PZT necessitates 
growth of the films on the substrate without deforming the underlying mirror beyond 


‘The polarization, P, is defined as the change in dipole moment, p, with volume: P = dp/dV. 
PZT has a spontaneous dipole moment due to the distribution of charges within the crystal unit 
cell, producing a non-zero dipole moment per unit volume. Application of an external electric field 
changes the dipole moment and produces a strain in the piezoelectric. This strain can be used to 
deform the underlying mirror. 
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Fig. 5. Simulation demonstrating reduction in the error amplitude by means of piezoelectric 
correction of a mirror segment. Left: before correction; Right: after correction with 5mm x 5mm 
piezoelectric cells. The color scale is the same in both panels. (See electronic edition for a color 
version of this figure.) 


correctable limits. Crystallization of ferroelectric films typically requires a high tem- 
perature processing step, with temperatures exceeding ~550°C for PZT; the Si, thin 
glass, and metal foils of interest for X-ray mirrors are all tolerant of these conditions. 
Deposition of piezoelectric films on electroded Si is widely reported, and has been 
scaled to mass production by numerous companies (e.g. TI, Fujitsu, Rohm, and 
Matsushita) for ferroelectric random access memories and microelectromechanical 
systems. Likewise, deposition of piezoelectric films on metal foils has been reported 
by multiple groups.°* °° In order to retain high property coefficients, it is essential 
that interfacial oxidation, as well as reaction layers between the piezoelectric and the 
metal, be minimized. Passivation layers (such as a HfOg film deposited by atomic 
layer deposition) prevent uncontrolled oxidation. In conjunction with LaNiO3 ori- 
enting bottom electrodes, high-quality oriented PZT films can be grown.°” Process- 
ing of PZT actuator arrays on thin glass is discussed in more detail below. For 
an X-ray optics application, the reflecting material is deposited on the front side 
of the optic following all other processing. Deposition conditions and thicknesses 
of the layers are chosen to balance the stresses associated with the actuator stack 
layers. 

One common challenge with sputtering materials where one or more of the 
cations have high volatility is achieving the desired stoichiometry. In particular, 
sputtering Pb-based compounds is complicated by the high vapor pressure of Pb 
species, which are easily resputtered from the growing film or evaporated during a 
post-deposition anneal.°® °! As such, excess Pb targets are used to correct for the 
loss of PbO and prevent the Pb-deficient pyrochlore phase from forming. However, 
if there is too much PbO, at high temperature the excess PbO will segregate in the 
grown film. In some cases, it will remain in the grain boundaries, creating a more 
electrically conductive pathway; if the temperature is high enough, it will volatilize, 
creating a porous film. 

We have sputter-deposited PZT films onto both flat and slumped glass. Typical 
processing conditions for the sputter process are shown in Fig. 6. In brief, the back 
sides of clean glass substrates are covered with a 30 nm Ti adhesion layer and a 
100 nm Pt bottom electrode. During the electrode deposition a substrate tempera- 
ture of 150°C improves adhesion and prevents delamination of the piezoelectric cells 
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Fig. 6. Sputter deposition conditions for PZT films on Eagle glass substrates for adjustable X-ray 
optics. 


on subsequent steps. A 0.5 jzm layer of PZT is sputtered in an argon ambient with an 
RF power density of 2W cm~? from a lead-rich target (PbNbo.01(Zro.52Tio.48)0.9903 
with 5% excess PbO) onto the unheated, electroded substrate. The layer is crystal- 
lized into the perovskite phase using either 18 hour heat treatments at 550-585°C°? 
or 1 minute crystallization steps at 650°C. The process is repeated to build up a 
PZT layer that is 1.5 microns thick. Half micron thick layers were used to prevent 
cracking, which occurred on crystallization of thicker amorphous films. If neces- 
sary, a final lead-rich layer is deposited using higher chamber pressures to min- 
imize the amount of pyrochlore phase at the surface. The top electrode is then 
sputter-deposited and patterned by lift-off. The resulting films patterned with large 
area (cm?) electrodes have relative permittivities of >1200, tand ~ 0.02) and an 
average remnant polarization > 23 uC cm~?. High yields on cm? electrodes have 
been achieved.®* Actuation of mounted parts demonstrates that ~2 pm of surface 
deflection can be achieved with 3 V (see Fig. 7). 

Our SAO/PSU team has demonstrated that piezoelectric adjuster cells can be 
prepared on slumped glass substrates. An example of this is shown in Fig. 8. The 
part has a 1 m radius of curvature thermally formed using a cylindrical fused silica 
mandrel® produced for SAO by collaborators at the INAF/Osservatorio Astro- 
nomico di Brera (OAB) in Milan, Italy. Piezoelectric material was deposited on 
the back surface of the mirror to produce actuator cells 5 mm axially x 10 mm 
azimuthally. The mirror was mounted in a pseudo-flight-like mount. 

Adjustable X-ray technology has been used to change mirror figure, as shown 
in Fig. 9. Approximately 100 piezo cells were simultaneously controlled using 1/4 of 


Jtan 6 is the ratio of the imaginary part of the dielectric permittivity to the real part of the dielectric 
permittivity. It is essentially a measure of how lossy the material is as a dielectric. 
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Fig. 7. Wavefront measurement of actuation of three 1 cm x 1 cm piezoelectric cells on a 10 cm 
glass flat with thin film adjustors. The colorbar scale units are nm. Application of 3 volts gave a 
deflection of about 0.25 wm. The deflection is deterministic and localized to the actuated electrodes. 


(See electronic edition for a color version of this figure.) 


Fig. 8. Array of piezoelectric cells with patterned Pt top electrodes on a slumped glass (1 m 
radius of curvature). The part is wired for a deterministic figure correction measurement. We are 
seeing the back (non-reflecting) side, with an array of 5mm x 10 mm actuator cells. 
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Fig. 9. Difference in surface slope between predicted and actual figure change imparted to a 
mounted cylindrical test mirror. Horizontal and vertical scales give dimensions in mm of a section 
of the 100 mm x 100 mm mirror. The cylindrical axis is vertical. The color bar gives slope 
differences in arcseconds. (See electronic edition for a color version of this figure.) 


the control electronics developed for X-ray testing. The mirror figure was measured 
before and after actuated changes using a 128 x 128 channel Shack-Hartmann 
wavefront sensor. Piezo cell voltages ranged from 0 to ~-+5V. Voltages for the piezo 
(adjuster) cells were optimized using rms slope error as the merit function. As the 
first test on a cylindrical piece, an axial sine wave with 0.3 wm amplitude, 100 mm 
period, and an rms slope error of ~2’'5 was imposed. The actual figure introduced 
matched the desired predicted figure change to within 0/47, rms, a value mostly 
dominated by measurement noise. The difference between the predicted and actual 
figure change is shown in Fig. 9. This rms residual compares favorably with earlier 
tests on flat mirrors where a 5’’6 rms slope change was imparted with a noise-limited 
accuracy of 0/8. 

The field of piezoelectric microelectromechanical systems is growing rapidly. 
Yole Development predicts that ~25,000 6-inch wafers coated with piezoelectric 
films will be needed annually to supply the ink jet printer market alone. There 
have been major investments in the infrastructure for manufacturing piezoMEMS 
in Europe (Nanostrain, Lab4MEMS, PiezoVolume, MEMS-pie, with foundries at 
CNRS, Sintef, Philips) and Asia (Rohm, Mitsubishi Materials, Silicon Sensing, 
Sumitomo Chemical). In the United States, the Army Research Laboratory and 
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Texas Instruments both have production lines. Numerous additional companies 
supply high quality piezoelectric films (Fujifilms, Ramtron, and others). Thus, it 
is likely that a commercial source for the mirror actuators would be available. 

If the adjustable mirror approach is scaled up to the size for the Lynx optics, 
a large number of piezoelectric mirror adjusters will be required to refine the figure of 
the X-ray mirror segments to the precision necessary for effective imaging. Depend- 
ing on the size of the mirror adjusters, this will require 10°-107 total adjusters. 
For operation, each mirror adjuster must be supplied with the control voltage that 
provides the necessary force for the required mirror correction. In principle, this 
control signal could be supplied through a wire to each adjuster; however 10°10" 
control wires would add greatly to the system weight and complexity and would 
also present obscuration problems. 

To avoid the problem of large numbers of wires and to allow improved integra- 
tion of the mirror adjuster wiring, row—column addressing can be used, as in active 
matrix displays. For example, a 4K Ultra HD color active matrix liquid crystal 
display has ~2.5 x 10° display pixels (3840 x 2160 pixels, 3 color subpixels per 
pixel) selected and isolated by the same number of thin film transistors and oper- 
ated by 6000 lines at the display edges and controlled using low-cost CMOS ICs at 
a few cents per line. Recent work has demonstrated that ZnO thin film transistors 
can be successfully integrated on thin flat substrates comparable to those in X-ray 
mirrors without degradation in performance (see Fig. 10).°° ZnO is a semiconductor 
and is similar to oxide semiconductors used in commercial display manufacturing 
(for example, indium and gallium-doped ZnO is used in Sharps 4K 5.5” display). 
Notably, ZnO is also radiation tolerant to > 100 MRad.®® 


Piezoelectric 
cell 


ddress lines 


Fig. 10. Example of a 100-mm diameter glass flat coated with an electrode-PZT-—electrode stack. 
ZnO thin film transistors have been fabricated for each piezoelectric cell to enable row—column 
addressing. The mirror surface is on the other side of the glass substrate (down in this photo). 
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4. Adjustable Optics Demonstration 


TRL 3 requires that analytical and/or experimental proof of the critical function has 
been established. For adjustable X-ray optics we identified three critical functions. 
First is the ability to deposit piezoelectric material onto a curved piece of glass, 
with a cell structure having a high yield of functional cells (see Section 3). Second 
is to deform the mirror surface in a predictable way. This requires that we will 
be able to measure the influence function, i.e. the effect that each cell has on the 
entire continuous surface as a function of applied voltage, and to verify that the 
measured influence functions agree with expectations from finite element analysis 
of the stresses expected according to the determined piezoelectric coefficients. Third, 
we must use those influence functions to correct a distorted mirror to better than 04 
rms diameter, consistent with the error budget for a telescope with sub-arcsecond 
resolution. These elements have been demonstrated, as discussed in this section. 

We have successfully deposited 7 x 7 arrays of adjuster cells on 100 mm x 
100 mm pieces of Corning Eagle’™ glass (Fig. 11). Yields of functioning cells 
approach 100%%:° with the current cleaning protocols in place. Measured influ- 
ence functions agree well with those calculated from finite element analysis,°° &” °° 
allowing correction as shown in Fig. 9. 

Initial simulation of the correction process considered only a single axial strip.® 
This served as a sanity check on the correction process. Higher fidelity correction 
simulations considered a mirror with a specific, realistic pattern of axial distortion. 
The correction simulation showed a reduction from a 12” rms diameter to a 04 


Fig. 11. Photograph of a7 x 7 array of adjuster cells, deposited on a 100 x 100 mm curved glass 
mirror. (Credit: Ref. 67.) 
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diameter (Fig. 4). Establishment of the ability to correct general distortions uses a 
“correction transfer function” (CTF) formalism.*® The CTF as a function of spatial 
frequency is defined as 1 — P,/P5, where Po is the power spectral distribution of 
the uncorrected error map and P, is the power distribution of the residuals after 
correction. For a specific configuration of 5 mm x 10 mm actuators, the CTF 
increases (P, decreases) rapidly below the Nyquist frequency of 0.05 mm! for 
sinusoidal amplitudes less than 1 micron.*® 

TRL 4 requires validation of components or breadboards in a laboratory envi- 
ronment. The critical technologies for this stage are to align a primary and secondary 
reflector to each other, within a fraction of an arcsecond, and then to measure and 
adjust their figures to produce a sub-arcsecond X-ray image. We are preparing to 
verify that X-ray image by performing a measurement in the NASA/MSFC stray 
light facility.“ Our breadboard will be a single element primary/secondary pair, 
100mm x 100mm in area and 0.4mm thick. Figure 12 shows the orientation of this 
mirror relative to the X-ray beam. 

A 100m long X-ray pipe gives a beam simulating an X-ray point source, which 
can be nearly monochromatic. A choice of Al-K X-rays at 1.49 keV energy minimizes 
scattering so that the image size indicates the ability to control the figure. The 
facility provides a CCD detector which can operate in either an integrating or single 
photon counting mode. 

We realize that the breadboard alignment mount shown is not a configuration 
that can be used for a flight assembly. That is because when each of ~90 shells is 
aligned and mounted, the reflecting surface must be accessible to optical metrology 
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Fig. 12. Drawing of the breadboard for X-ray testing of an aligned mirror pair. The figure shows 
the concave (reflecting) side of the primary (right) and secondary (left) mirror elements, installed 
in the alignment fixture. The line from the X-ray source to detector is horizontal, so there is 
minimal gravity sag of the mirrors (so-called “parenthesis” orientation). The fixture is mounted in 
a thermal control box, which in turn rides on stages with six degrees of freedom for alignment. 
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so that the proper post-alignment distortions and influence functions can be mea- 
sured, and the figure corrected. The path to TRL 5 must include design of a mount 
compliant with this requirement. A TRL 5 breadboard would naturally include 
multiple mirror pairs with different radii, and would be a design that could hold 
the mirror assembly when subjected to the launch and orbital environments. 


5. Adjustable Optics for an X-ray Observatory 


While most of the astrophysical problems of the coming decades will require sub- 
arcsecond imaging with large X-ray collecting areas, we can illustrate the require- 
ments that drive the mirror by considering the study of the formation history of 
super-massive, 10°-!°M,.* black holes in the early universe.’+~4 In round numbers, 
a 104 Mo black hole emits 1.3x 104? erg s~! at the Eddington limit. From a system 
at redshift 10, the flux at Earth would be 3x107!9 erg cm~? s~! if 30% of the 
energy were emitted in X-rays. Extrapolating from Chandra deep surveys, there are 
more than 10° sources per square degree at that flux, so that any telescope must 
have a resolution element less than 3 arcsec? (i.e. radius less than 1 arcsec) to avoid 
limits imposed by source confusion noise. Detection of such a flux in a 4 Msec long 
survey observation will require 2m? of effective collecting area. 

While new mirror technology and a larger size are required for the next gen- 
eration X-ray observatory, the Lynx mission (originally called X-ray Surveyor’), 
all other requirements could remain identical or extremely similar to the Chandra 
observatory. In particular, since the mirror resolution is the same, the pointing con- 
trol and aspect determination system, the architecture of the avionics and safing 
systems, and the operational concepts can be used directly. The science instruments, 
including the focal plane cameras and possible grating spectrometers, will be pro- 
cured via competitive peer review, and will naturally have much greater capabilities 
than the Chandra instruments, which were selected for flight in 1984. 

To discover previously unknown sources, it is important to have a relatively large 
field of view, and the best possible angular resolution over that field of view. For 
this reason, the Wolter—Schwarzschild geometry is preferred, as it satisfies the Abbe 
sine condition.”°:! Figure 13 shows the confusion limit for a Wolter-Schwarzschild 
telescope that has 05 half power diameter on axis, according to the approximate 
scaling derived by Chase and VanSpeybroeck.” 

The actual mirror assembly would be composed of large numbers of individual 
segments, similar to what is shown in Fig. 11. The predicted performance shown in 
Fig. 13 is based on segments that are 200 mm in the axial direction and have a 10m 
focal length. The various segments would extend between 200 and 400mm in the 


Kk Mo= mass of our sun, 2 x 10°9 gm. 
See Chapter 1 of this volume for further discussion of the Wolter—Schwarzchild geometry. 
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Fig. 13. Confusion-limited source density (left ordinate) vs. off-axis angle for an otherwise perfect 
Wolter-Schwarzschild telescope with 0‘’5 resolution on-axis. The right-hand ordinate (indicated 
by the mustard-colored tick marks) gives the 0.5 to 2 keV flux corresponding to the source density 
on the left. Source confusion noise does not allow reliable blind detection of sources above the solid 
line. In a 4-million second observation, such a telescope would not be confusion-limited until more 
than 11’ offaxis. 


azimuthal direction. The segments would be aligned into modules (bottom middle 
of Fig. 2), which in turn would be installed into an overall mirror structure (bottom 
right of Fig. 2). The capability for on-orbit adjustment would be realized by deposit- 
ing two strain gauges on each cell. One would be extremely sensitive to temperature 
changes, and the other would measure the actual in-plane strain. As a function of 
changes of either temperature or the piezoelectric coefficient, on-board logic (e.g. 
field-programmable gate arrays) would adjust voltages closed-loop to maintain the 
known, required strain. Concepts for such an observatory have been presented over 
the past few years,’! “4:76 and NASA has currently formed a Science and Technology 
Definition Team™’ to define the science case and assess the technical requirements 
and feasibility of the Lynx mission for presentation to the 2020 Committee for a 
Decadal Survey of Astronomy and Astrophysics. 
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We know of three basic geometries that utilize grazing-incidence reflection of 
X-rays to produce X-ray images. In all three two successive reflections are required 
to produce a meaningful image largely free of coma. Wolter systems employ sur- 
faces of revolution with axial symmetry, in which the two successive reflections 
occur in the same plane. Kirkpatrick—Baez systems utilize reflecting surfaces that 
are almost planar and are set such that the two reflection planes are perpendicular 
to one another. The third is the lobster eye geometry which comprises an array of 
small, closely packed tubes or pores that have a square cross-section and imaging 
is achieved by grazing-incidence reflections from the planar internal surfaces of 
the pores. Lobster eye optics are peculiar. The two reflections that produce an 
image occur from adjacent internal walls of each pore, but the combined response 
of hundreds or thousands of individual pores is required to produce a recognizable 
image or picture. Lobster eye X-ray optics are difficult to make but they offer two 
major advantages over Wolter systems. The field of view is not limited by the 
small grazing angles required to reflect X-rays and, in principle, a single lobster 
eye optic can produce a continuous image of the entire sky. Providing the width 
of the pores is large compared with the thickness of the pore walls, lobster eye 
optics are intrinsically very low mass. X-ray telescopes using lobster eye optics 
have great potential for X-ray transient imaging and for applications where low- 
mass is crucial. 


The Eyes of Macruran Crustaceans 


Lobsters, shrimps and crayfish (macruran crustaceans) have eyes which work in a 
different way to the more familiar refractive lenses employed by a wide variety of 
animals, including man. The principle of imaging in these lobster eyes was discovered 
by Refs. 1 and 2. Figure 1 shows two micrographs of the structure of a lobster eye 
and a schematic of the imaging geometry. In place of a lens the optic comprises a 
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Fig. 1. Left-hand panel: micrograph of the front surface of a lobster eye showing the array of 
square pores packed across a spherical surface. Middle panel: micrograph of a section through the 
square pore structure. Right-hand panel: schematic of the imaging geometry showing the spherical 
array of square pores and the retinal surface on a sphere of half the radius of the optic. 


large number of very small tubes or pores, each with a square cross-section. The 
internal walls of every pore are smooth and reflecting such that light is transmitted 
through by multiple reflections and exits as a number of narrow beams. The pores 
are packed over a spherical surface so that they all point at a common center of 
curvature and the retinal surface is on a concentric sphere with half the radius of 
curvature. In this configuration beams produced by two reflections from adjacent 
walls of the pores all combine to produce a true image on the retina. 

In the eyes of crustaceans the length of the square pores is only a few times 
their width (L/d small) and the reflection angles for visible light can be large so that 
the field of view (FOV) or acceptance angle of individual pores is also large. It was 
pointed out by Ref. 3 that the same imaging principle could be used in a focusing 
X-ray telescope if the L/d ratio of the pores was large, ~100, and the internal walls 
of the pores were highly polished. The reflection angles within the pore are then 
grazing (1-2 degrees with respect to the surface) and soft X-rays in the energy band 
0.1-10 keV will be reflected. An array of square pores in the lobster eye geometry 
can then produce an image of the soft X-ray sky. Because X-rays of this energy 
have very small wavelengths (an energy of 1 keV corresponds to a wavelength of 
12.4 A) the pores can be very small, ~50 ym, without diffraction effects degrading 
the focused image. 


2. A Single Square Pore 


The action of a single square pore is illustrated in Fig. 2. A distant source is offset 
from the pore axis by angles a and (. The four rays plotted suffer 0, 1, 1 and 2 
reflections, respectively, and intersect the focal surface as shown. Hence the pore 
splits the flux over the entrance aperture of the pore into four beams which diverge to 
intersect the focal surface as illustrated in the bottom right of Fig. 2. The geometric 
shape of each beam is dictated by the aperture regions shown in the top left. If the 
pore is rotated about its axis the angles a and ( change but the off-axis angle 
0 = (a? + 8?)!/? remains constant (for small angles such that tan(@) ~ 0) and the 
centers of the 0-reflection and 2-reflection beams on the focal surface stay fixed. 
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Fig. 2. Center: a pore of axial length L placed at a focal distance F' from the focal surface. 
Rays at angles a and £ with respect to the pore axis are transmitted through the square pore. 
The number of reflections is indicated on the focal surface. Top right: the three rays which suffer 
reflections at positions indicated by the dots. Top left: the pore aperture as seen from the source. 
The exit aperture is offset by distances aL and GL. Regions of the aperture which suffer 0, 1 and 
2 reflections are marked. Bottom right: the aperture is split into four beams which intersect the 
focal surface as shown. Each beam is offset from the intersection of the pore axis with the focal 
surface by distances aF and BF. 


The centers of the 1-reflection beams move around the intersection of the pore axis 
with the focal surface on a circle of radius 0F. When a = d/L or 3 & d/L the 
0-reflection beam vanishes along with one of the 1-reflection beams. If both a and 
GB satisfy this condition only the 2-reflection beam remains. 

When the off-axis angles of the source are larger than arctan(d/L) then mul- 
tiple reflections can occur within the pore. The source angular space is divided 
into a checkerboard of regions delineated by the lines a = arctan(id/L) and 
3 = arctan(jd/L) where i = 0,+1,+2,... and 7 = 0,+1,+2,.... For the regions 


between lines defined by i = —1,0,1 and 7 = —1,0,1 we get the behavior described 
above and illustrated in Fig. 2. For source angles given by larger values of |i] and 


|j| the pore will still split the entrance aperture into four beams but these beams 
will correspond to multiple reflections. Those rays which undergo an even number 
of reflections in one direction (from opposite walls in the pore) will not be deviated 
in that direction. On the other hand those rays which undergo an odd number 
of reflections will be deviated. If there are an even number of reflections in both 
directions the resulting beam is like the 0-reflection beam in the simple case. If the 
number of reflections is odd—even or even—odd in the two directions then the beams 
are equivalent to the 1-reflection beams in the simple case. Finally if there are an 
odd number of reflections in both directions the beam will be like the 2-reflection 
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beam in the simple case. Since the reflectivity is less than one, multiple reflections 
will be fainter, and furthermore a large number of reflections will only occur if the 
grazing angles are large. As we will show later the X-ray reflectivity decreases if 
the grazing angles are large or the X-ray energy is high, so in practice the multiple 
reflection components contribute very little to the final X-ray response of the lobster 
eye optic. 


3. An Array of Square Pores 


The 2-reflection beam from a single pore (or the beam with an odd number of 
reflections in both directions for source angles > d/L) is equivalent to the reflection 
from a narrow, plane mirror of length L lying along the axis of the pore with the 
mirror normal in the plane that contains the line to the source and the pore axis. 
Crucially this is true whatever the rotation angle of the pore about its axis. If 
the pore is positioned on a spherical surface of radius of curvature 2F’, with the 
pore axis pointing towards the center of curvature, the 2-reflection beam emulates 
the behavior of a spherical mirror as shown in Fig. 3. We can arrange an array of 
pores across the full spherical surface to approximate the imaging behavior of the 
spherical mirror. All the 2-reflection beams generated by the square pores from a 
distant point source will form an image spot of size = d. The spherical surface of 


radius 2F' forms the approximate principal surface of the lobster eye optic. 

The simplest packing scheme is a square packed array as shown in the top left 
of Fig. 4. This is effectively the packing adopted in the real lobster eye shown in 
Fig. 1. Because the pores have a square cross-section the packing density is high 


Fig. 3. The formation of an image by a spherical mirror of radius of curvature 2F' emulated by 
square pores. The conventional spherical mirror reflects input rays from the inner surface while 
the square pores produce an identical image from rays incident on the external spherical surface. 
Two pores are shown, one at large grazing angle 0 (with 6 labeled) and one at small 6. The image 
from pores at small 6 occurs on the spherical focal surface a distance F’ from the mirror while 
pores at large @ suffer spherical aberration as expected. 
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Fig. 4. Pore packing schemes. Top-left: A simple square packed array. Top-right: A square packed 
array with random rotation angles. Bottom-left: A waffle packing in which the pore rotations run 
smoothly over a range of —22.5 to +22.5 degrees. Bottom-right: A sunflower packing in which 
every pore is rotated such that the diagonal across the square aperture points to a common center. 


with minimum losses introduced by the thickness of the pore walls. For a given point 
source the offset angle @ will be constant along each row of pores in the array and 
similarly the angle @ will be constant along each column of pores. Because of this the 
1-reflection beams from the pores will sum to produce two line foci which intersect 
the focused spot from the 2-reflection beams. Figure 5 shows a section through a 
ray tracing of a square packed array. The pores have length L, width d and the wall 
thickness is w. Rays from two distant sources are shown. The images produced on 
the focal surface have a full width equal to the pore size d. Rays which went directly 
through the pores without reflection are not shown. Figure 4 also illustrates other 
packing schemes as discussed by Refs. 4 and 5. The random packing (top-right) 
is inefficient because it requires a larger average wall thickness to accommodate 
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the random orientation of the square pores and, furthermore, it is probably rather 
difficult to manufacture. The waffle packing (bottom-left) is almost as efficient as 
the simple square packed array but introduces a range of rotation angles similar to 
the random packing. This waffle packing can be considered as a simple modification 
of the square packed scheme and maybe more straightforward to manufacture. The 
1-reflection beams from the pores in the random or waffle packing will not sum 
to form one-dimensional line foci as for the simple square packed case but will be 
distributed at all azimuthal angles surrounding the central 2-reflection focus. The 
sunflower packing (bottom-left) has a center of symmetry and is useful if we want 
to maximize the true-focus 2-reflection flux for a particular position on the sky. 
The single pore considered in the previous section, and illustrated in Fig. 2, has 
constant cross-section with d the same at the entrance and exit aperture. When 
pores are packed across a spherical surface, as shown in Fig. 5, a slight taper 
could be introduced such that the reflection surfaces are aligned to the common 
center of curvature. Alternatively the pores could be packed such that they retain 
the constant d value along their length and the angular change required could be 
accommodated entirely in a wall thickness variation. The difference between these 
two possibilities is an angular tilt of the reflecting walls of ~d/4F’. Such a tilt will 
introduce a shift of ~d/2 of the reflected beams on the focal surface. As we will see 
later the manufacturing tolerances that can be achieved when making an array of 
square pores on a spherical surface introduce very much larger angular errors than 
the difference between these two cases. It is very unlikely that some process will be 
devised in the future that can produce pores which conform accurately to either of 


Fig. 5. Section through a ray tracing of a square packed array. Rays from two distant point 
sources are plotted. For clarity, those rays which pass directly through the pores without reflection 
or suffer > 1 reflection are not plotted. 
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Fig. 6. Left-hand panel: PSF produced by a simple squared packed array. Right-hand panel: PSF 
produced by the random or waffle packing schemes shown in Fig. 4 


the alternatives described above so for simplicity we have used pores of constant d 
along their length in all the analysis presented here. 

The full geometric point spread function (PSF) of the optic is produced by 
the combination of 0, 1, 2 and multiple reflection rays from all the pores. Because 
of the symmetry of the packing (except for the sunflower packing and ignoring 
edge effects and spherical distortion when the pores cover a large angular range on 
spherical surface radius 2F) the PSF has the same form independent of the position 
of the source in the FOV. Figure 6 shows the central portion of the PSF for the 
simple square packed array (left) and the random or waffle packed array (right). 
As expected the single reflection rays produce line foci or cross-arms in the simple 
square packed case. The single reflection rays are still present for the random or 
waffle case but the flux from these beams is distributed evenly in azimuth around 
the central focus. The full width of the central focused spot and the cross-arms 
for the simple square packing is d, the size of the pores used. It is just possible to 
see the faint, diffuse, 0-reflection flux for the simple square packed case. This flux 
is hidden by the 1-reflection wings in the random or waffle distribution. 


4. The Effective Aperture 


The array of pores of the lobster eye optics can extend indefinitely across the prin- 
cipal spherical surface to accommodate a wide FOV, but only a subset of pores 
will be active for a particular source direction. The effective aperture provided by 
the packed array is controlled by the packing scheme, the L/d ratio of the pores 
and the X-ray reflectivity as a function of grazing angle. Figure 5 shows that rays 
which are close to the line from a point source to the center of curvature suffer 
reflection at small grazing angles and rays further away from this source line suffer 
larger grazing reflection angles. In practical implementations of lobster eye optics 
high-Z elements like Nickel, Gold or Iridium are typical materials chosen to coat the 
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Fig. 7. Theoretical reflectivity vs. grazing angle from a perfect Iridium for five X-ray energies, 
0.5, 1, 1.5, 2.0 and 5 keV. 


reflecting surfaces. The X-ray reflectivity from a perfect surface of such materials 
is high for grazing angles @ below the critical angle given by 6. ~ E~'p!/? degrees, 
where E is the X-ray energy in keV and p is the material density in gm/cm?. For 
example, theoretical reflectivity curves for an Iridium surface are shown in Fig. 7. 
When the grazing angle 6 > arctan(d/L), rays will suffer multiple reflections and 
will be suppressed because the efficiency is low. If @ < arctan(d/L) the reflectivity 
will be high but most rays will go straight through without reflection and the area 
associated with the 1- and 2-reflection beams will be very low. Figure 8 shows the 
distributions of aperture area (detected flux) associated with 0, 1, 2 and multi- 
ple reflections on the principal surface. These distributions were produced using 
a focal length F = 1000mm, a pore size of d = 0.02mm and a pore length of 
L = 1mm, giving L/d = 50. The reflecting surfaces were Iridium coated and the 
photon energy was 1 keV. The peak of the focused 2-reflection flux occurs at a radius 
of 2,/(2) Fd/L. This corresponds to the off-axis angle at which the 2-reflection beam 
is a maximum and the other beams disappear (see Fig. 2). The edge of the aperture 
associated with the focused flux occurs at a radius of (2\/(2) + 1)Fd/L. Circles of 
this radius are plotted in Fig. 8. The symmetry in the distributions depends on the 
symmetry of the pore rotation distributions in the packing scheme used. A waffle 
packing with period 400 pores (~8mm) produces the distributions shown in the 
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Fig. 8. Distributions of detected flux over the principal surface. Top-row: waffle packing. Bottom- 
row: simple square packing. The panels from left to right are: focused flux from the central peak 
produced by 2-reflection rays from adjacent walls of the pores; 1-reflection flux which produces 
the cross-arms in the simple square packing case; 0-reflection flux which goes straight through the 
pores without reflection; multiple reflection flux including 2-reflections from opposite side of the 
pores. The circle plotted has radius (2,/(2) + 1)Fd/L. All panels are plotted using the same 
z-scaling. 


top-row of Fig. 8. The periodicity of this packing is just visible as the faint vertical 
striped pattern. There are two vertical strips for every waffle period. 

If the L/d for the pores is fixed but the photon energy is increased the flux 
at large radii will start to diminish, the effective aperture will start to shrink 
and the total effective area in the central focus will start to decrease when 
0. & (180/7)(2\/(2) + 1)d/L degrees. If the photon energy is decreased multiple 
reflection flux from the outer regions will increase slowly. Using the expression for the 
critical angle as a function of photon energy and surface density above gives us a crit- 
ical energy associated with the L/d, E. © (7/180)p!/2(L/d)/(2,/(2)+1) keV, where 
p (gm/cm®) is the density of the reflecting coating. For L/d = 50 and p = 22.65 (a 
pure Iridium coating), E. = 1.08 keV. 


5. Angular Resolution 


The size of the central focused spot will determine the angular resolution and this is 
set by the pore aperture size, d, spherical aberration and diffraction. The 2-reflection 
beams from the individual pores have a rectangular geometric cross-section with 
dimensions in the range 0 to d depending on the off-axis angles a and £ (see Fig. 2). 
For pores close to the axis the angle @ is small (see Fig. 3) and these beams combine 
to form a spot of full width ~d which corresponds to a geometric angular resolution 
limit (full width) of A@, = d/F. The exact profile of this spot will depend on the 
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packing scheme used. Spherical aberration, as indicated in Fig. 3, introduces a radial 
offset from the axial position of F'0°. If we are operating at or below the critical X-ray 
energy associated with the L/d value for the pores the effective aperture described 
above gives Onax = V2d/ L, so spherical aberration gives an angular resolution 
limit (full width) of A@, = 4V2(d/L)°. For E, = 1 keV and L/d = 50 we have 
Aé, = 9 arc seconds. The diffraction spread will vary from pore to pore because 
pores in different aperture positions produce 2-reflection beams of different cross- 
section, but a reasonable estimate of the average diffraction limit is A0q = 2A/d, 
where A is the X-ray wavelength. The highest angular resolution will result if the 
pore size is chosen such that the geometric and diffraction limits are equal, giving 
d = (2\F)\/? and AO, = Aéyg = (2\/F)'/?. If F = 1 m and the photon energy 
is lkeV, d = 50 pm and A@, = Aéy = 10 arc seconds. Combining the geometric, 
spherical aberration and diffraction limits, we get an angular resolution at 1 keV of 
(Ad? + A? + Ad4)'/? = 17 arc seconds, using the optimum L/d = 50, or d = 50 pm 
for F=1m. 

The geometric PSF distributions shown in Fig. 8 were obtained using the opti- 
mum focal position such that the distance F’ is measured from half way along the 
length of the pore to the focal plane as indicated in Figs. 2, 3 and 5. If this distance 
is set from either the entrance aperture or exit aperture of the pore array (as shown 
in Fig. 1 of Ref. 3) the full width of the central spot and cross-arms is doubled. 
Therefore, when the angular resolution is close to the optimum limit, the depth of 
field is set by the length of the pores L and the optic-detector distance must be set 
accordingly. 

As we shall discuss in more detail below, in current practical X-ray implemen- 
tations of lobster eye optics we can achieve close to the optimum values for d and L 
but the angular resolution is determined by pore shape, pore alignment, reflecting 
surface figure errors and surface roughness rather than the pore size or accurate 
positioning of the focal plane with respect to the pore array. 


6. Fabrication of Lobster Eye X-ray Optics 


The geometrical requirements of a lobster eye X-ray optic are simple to state but a 
challenge to manufacture. The pores must be square with width, 0.01 <d< 0.5mm, 
and length LZ such that L/d is in the range 20-200, depending on the X-ray energy 
range you hope to cover and the angular resolution you hope to achieve. All the 
inner walls of the pores must be flat, very smooth with a surface roughness < 1 nm, 
and made from or coated with a high-Z (dense) material to give a high X-ray 
reflection efficiency. The pores must be integrated into an array using a simple 
square, waffle, or similar packing scheme with a wall thickness that gives an open 
fraction of > 50% so that the optic has a reasonable efficiency. Finally, the pores 
must be packed on a spherical surface with radius of curvature 2F' and with the 
axis of every pore aligned to point towards the common center of curvature with an 
accuracy in the range 0.1—5 arc minutes, commensurate with the angular resolution 
required. 
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Fig. 9. Top-left: a schematic of a square pore MCP with simple square packing. Top-right: micro- 
graph of a square packed MCP. Bottom-left: micrograph of an Iridium coated square pore MCP 
with pore size d = 20 wm and wall thickness w = 6 pm, as indicated. Bottom-right: the packing 
of a radially packed MCP. 


Reference 3 discussed possible methods of manufacture, but it was not until a 
decade later that glass-walled square pore microchannel plates (MCPs) offered a 
practical solution to the problem.®? The early square pore plates were manufac- 
tured by Galileo Electo-Optics and Philips Components, while the current genera- 
tion of these devices are produced by PHOTONIS France SAS.* 

MCPs are not very photogenic, but Fig. 9 shows a schematic and micrograph 
pictures of square pore MCPs. The radially packed MCPs shown in the bottom-right 
of the figure are used for the Wolter-I geometry implemented for the Mercury imag- 
ing MIXS-T!° instrument on the BepiColombo payload. A waffle packing scheme 
has not yet been achieved in an MCP but the success of implementing a radial 
packing indicates that a waffle packing is possible. The details of how square pore 
MCPs are manufactured will not be described in detail here. Suffice to say the 
plates are initially made as a solid block of glass with the volume inside the pores 
made from etchable glass of square dimension d and the surrounding walls made 
from non-etchable glass thickness w. The thin plates are cut from the solid block 


“*PHOTONIS France SAS, Avenue Roger Roncier, 19100 Brive La Gaillarde, France. 
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as a slice with a thickness to define L. The plate is then thermally slumped to a 
spherical profile and finally the pores are etched away to produce the MCP. Plates 
can be manufactured with d, w and L/d to match the requirements for optimum 
performance in the soft X-ray band, as described above, and with a radius of cur- 
vature to accommodate F’ > 0.3 m. Typical PHOTONIS plates have d = 0.02 or 
d= 0.04mm, 1 < ZL < 3mm and are square with dimensions 40 x 40mm. Because 
of the way they are made, the ratio of the pore size to wall thickness, w/d, remains 
constant, giving an open fraction of ~60%, so the geometric efficiency of the plates 
is high and the effective density (mass) of the plates is very low. 

The first X-ray observation of the characteristic cruciform image structure 
expected from a planar, square packed, square pore MCP (d = 86 wm, L = 4.8) 
was reported by Ref. 11. The behavior of a spherically slumped round-pore MCP 
(d = 12.5 um, L = 1mm and radius of curvature R of 1.4 m) was discussed by 
Ref. 12, and Ref. 13 reported focusing by a slumped square-pore MCP, R = 1 m, 
for the first time. Reference 14 showed that, with the right processing, the surface 
roughness of channel walls of an MCP was ~11 A, indicating that MCPs would 
be suitable for an efficient X-ray optic. Reference 15 measured the characteristics 
of plates with very deep pores, L/d up to 500, indicating that MCPs could be 
optimized for use as hard X-ray optics. The technique discussed by Ref. 16 can be 
used to coat the pore walls with Iridium, as required for the MIXS-T!° instrument 
on BepiColumbo. So, at the time of writing, all the characteristics required for a 
X-ray lobster eye optic can and have been built into a square pore MCP. The lobster 
eye optic geometry can be used in both wide-field and narrow-field instruments (as 
describe in detail below) for X-ray imaging in astrophysics, heliophysics and plan- 
etary science. A prototype wide FOV instrument (STORM!”!8 Sheath Transport 
Observer for the Redistribution of Mass) incorporating square pore MCP optics has 
been constructed for imaging the Solar Wind charge exchange (SWCX) emission and 
other emission components in the local vicinity and the prototype was successfully 
flown’ on a sounding rocket. 


7. Alignment, Distortion and Imperfections 


If square pore MCPs are used to construct the optic for a lobster eye X-ray telescope 
the aperture comprises a mosaic of plates accurately mounted and aligned on a 
support frame. The performance of the integrated optic is limited by qualities and 
imperfections which are intrinsic to the plates and errors introduced when the plates 
are mounted on the frame. 

The intrinsic performance of the plates is limited by: 


e Spherical aberration, AO, = 4/2(d/L)?. 
e Geometric pore size, A@, = d/F. 


’ http: //www.nasa.gov /topics/technology /features/wide-xray.html 
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Diffraction, A@g = 2\/d. 

MCP bias error. When the plates are cut from the block the pore axis may not 
be perpendicular to the plate surface. When MCPs are used as photon detectors 
the plates are often cut to incorporate a deliberate bias angle. If the bias error is 
4 the focal image will shift by A@, = 26>. 

Figure gradient errors on the pore walls introduced by the manufacturing 
process, Ar. 

Surface roughness of the pore walls. If the rms surface roughness is o,, the 
Total Integrated Scatter (TIS: fraction of radiation scattered) for a wavelength 
at a grazing angle 0 is TIS = 1 — exp(—(4700,/.)*). The grazing angle that 
corresponds to the bulk of the flux for a given source is 6 © 2V2d/L, so 
TIS = 128n?(0,/X)?(d/L)?. If o = 1 nm, L/d = 50 and the photon energy is 
LkeV (A = 1.24 nm), TIS + 0.3. The angular distribution of this scattered radi- 
ation will depend on the power spectrum of the surface roughness distribution. If 
the scattering angles are larger than ~d/L radiation will be lost by collimation 
within the pores and the efficiency will drop. 

Slumping errors and pore alignment. When the plates are slumped to conform to 
the correct spherical form, pore alignment errors can be introduced, as indicated 
in Fig. 10. During the process the plate glass is shearing rather than just bending 
and the pores end up pointing towards a center of curvature beyond the center of 
curvature of the plate. The edges of the plates are more vulnerable to slumping 
errors and the alignment error will vary across the plate, rising to a maximum 
0,. An alignment error of 0, will introduce a spread in the focused image of 
Aba & 20a. 

Pore shear errors. The stretching/compression of the plate during slumping can 
introduce a shear error in the packing and shape of the individual pores as shown 
in the right-hand panel of Fig. 10. The pore alignment errors introduce a blurring 
of both the central spot and cross-arms in the PSF, as indicated in the left-hand 
panel of Fig. 11. The shear error splits the central spot into four peaks, as shown 
in the right-hand panel of Fig. 11. The cross-arms from single reflections remain 
the same width but suffer the same shear distortion angle as the pores. If the 
shear angle (difference between the pore corner angles and a right angle) is 6p, 
the separation of the peaks (and hence the degradation in angular resolution) is 
given by Aé), = (20)6), where @ is the grazing reflection angle. Using 6 © 2/2d/L 
gives AO), © 4\/2(d/L)0y. If we have L/d = 50 and we are operating at ~1 keV, 
a shear error of 6, = 1 degree will give AO, = 6.8 arc minutes, so the width of 
the central spot in the PSF is rather sensitive to a pore shear error. 


The errors introduced when the plates are mounted: 


Rotation errors. If the plates have a simple square packing, a plate rotation error 
will introduce a rotation of the single reflection cross-arm pattern from that plate 
and the cross-arms from different plates will not line up exactly. The position and 
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Fig. 10. Left: pore alignment errors — a mismatch between the plate curvature and the pore axis 
alignment center. Right: shear distortion of the pore apertures in a simple square packing scheme. 


Fig. 11. Left: center of the PSF when a Gaussian spread of pore alignment errors is present. 
Right: center of the PSF when a pore shear error is present. 


width of the central focused spot will be unchanged and therefore rotation errors 
when mounting are not considered to be critical. 

e Pore alignment errors. Pores in different areas of the plate point to different 
centers of curvature. Figure errors in the spherical surfaces of the plate or the 
mounting cause distortion of the plates when they are fixed in position. 

e Focal length errors. If the curvature of the spherical mounting surface does not 
match the center of curvature defined by the pores in the slumped MCP then 
the focal length will be in error. This can be corrected by a small change in the 
detector position with respect to the optic. 


The pore figure, alignment and shear errors (A6;, AO, and A6;,) and the surface 
roughness, o;, are all intimately connected with the plate manufacturing processes. 
They are very difficult to measure individually because they are buried within the 
pore structure, which has L/d ~ 50 where d is typically a few tens of microns. 
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Fig. 12. Model for the power spectrum of the surface roughness. 


We can use the as-yet unpublished® calibration results from the BepiColombo 
MIXS-T!° to estimate the errors present in a single lobster eye MCP. The com- 
bination of figure and alignment errors (A@; and A@,) gives ~0.75 arc minutes rms 
for each reflection direction (axis). The average pore shear angle is ~0.3 degrees, so 
AO, © 2.2 arc minutes. The power spectrum of the surface roughness can be mod- 
eled using a broken powerlaw, as illustrated in Fig. 12 (see for example Ref. 19). 
Below the break spatial frequency, w», the spectrum is assumed to be constant, 
while for w > wy, the spectrum is a power law, P(w) ~ w~7, where y ~ 1.4. 
The integral of the power spectrum over all spatial frequencies gives the mean 
square roughness, a, and w, controls the typical scattering angle, Oscar = wyrA/O, 
where 0 © 2\/2d/L is the typical grazing angle. The MIXS-T results indicate that 
os © 13 Arms and wy © 350mm7!. For 1keV and L/d = 50, Oscat = 0.44 degrees. 
Thus the surface roughness gives rise to wide angle scattering and a significant 
fraction of the scattered flux is collimated by the narrow pores and lost. Simulation 
using a combination of these errors gives a central focused spot with FWMH = 4.5 
arc minutes and predicts a loss of ~20% of flux at 1keV. To simulate the all-up 
response we should also include the errors introduced when mounting and aligning 
the component plates. Using the same mounting technique employed for MIXS-T, 
the extra pore alignment errors introduced by mounting are < 1 arc minute and focal 
length errors can be eliminated by a small axial shift in the position of the detector. 
The plate axial rotation errors are ~0.5 degrees and these introduce a broadening 
of the outer regions of the single reflection cross arms but have no effect on the 
central focused spot. The performance of a lobster eye optic constructed using the 
currently available slumped, square pore MCPs is limited by the surface roughness 
in the pores, the alignment of the pores and the shear errors in the cross-section of 
the pores. 


©Thanks to Adrian Martindale, Jim Pearson, Charly Feldman and the MIXS-T team at the 
University of Leicester for providing the data prior to publication. 
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8. A Wide-field Lobster Eye X-ray Telescope 


The lobster eye is a grazing-incidence optical geometry that can provide a contin- 
uous wide FOV, unrestricted by the grazing angles which have to be used in the 
X-ray energy band. The possibility of using these optics for a X-ray all-sky monitor 
was noted by Ref. 8 and described in detail by Ref. 20. A X-ray all-sky monitor 
payload dubbed LOBSTER-ISS was proposed by Ref. 21 for accommodation on the 
International Space Station. Since then a number of similar instruments have been 
studied, including the proposed free-flying spacecraft A-STAR (2013),?? which was 
submitted to the ESA S-Class Announcement of Opportunity in June, 2012. 

All the proposed wide-field instruments use the concept of a lobster eye module 
as illustrated in Fig. 13. The module comprises a square array of identical square 
pore MCPs arranged over a spherical mounting frame, and a position sensitive 
X-ray detector with half the linear dimensions of the optic placed in the focal 
plane. The detector surface should be spherical with half the radius of curvature of 
the optic, but in practice this may be approximated by a tiling of planar detectors. 
The effective area of the optic, as a function of energy, that can be achieved using 
such an arrangement is shown in the right-hand panel of Fig. 13. For this plot a 
focal length of F = 400mm was used and the reflecting walls of the pores were 
coated with Iridium. Changing the design to a different focal length is simple. The 
same array of MCPs can be employed (same pore size and same L/d ratio) but, of 
course, the radius of curvature of the spherical mounting frame must be changed 
such that R = 2F'. When making such a change the effective area achieved scales 
as ~F?, while the angular size of the PSF remains reasonably constant. Using the 
current production of square pore MCPs the minimum focal length is ~300mm 
(R = 600mm). 

The module shown in Fig. 13 uses a 7 x 7 array of MCPs, where each plate has 
an open aperture 38 x 38mm? and the support ribs of the frame have width 4mm. 


cm? 


keV 


Fig. 13. The wide-field telescope module. F = 400 mm. 
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The total active width of the optic is therefore 290mm. The detector has width 
D = 145mm, giving a square FOV with width D/F radians, or 20.8 degrees for 
F = 400mm. The FOV therefore has an area of ~428 square degrees. If the focal 
length is reduced to 300mm the FOV will be ~767 square degrees, but the effective 
area at 1keV will drop from 3.5 cm? to 2.0 cm?. 

If the linear dimensions of the detector are exactly half that of the optic, 
vignetting occurs at the edge of the FOV. At the sides the area drops to 50% 
of the central value (because at that position only half the effective aperture of the 
optic is provided, see Fig. 8). At the corners of the FOV the area drops to 25% of 
the central value. Over most of the FOV there is no vignetting and the area only 
starts to drop at angles ~2\/2d/L from the edge, ~3 degrees if L/d = 50. 

The PSF of a wide-field instrument with F = 300 mm is shown in Fig. 14. The 
first zero in the cross-arms occurs at an off-axis angle 2d/L, which corresponds to 
2.3 degrees if L/d = 50. The source is positioned above the center of one of the 
MCPs in the aperture array and the shadow cast by the gaps between the MCPs 
is therefore set symmetrically about the focused spot. At an energy of 1 keV the 
cross-arms are visible out to an off-axis angle of ~5 degrees. 

The angular resolution of such a module will be limited by the quality of the 
square pore MCPs that are currently available, as described above. The PSF will 
have a central focused spot size of 4.5 arc minutes FWHM, so the effective beam 
which contains all the truly focused flux will be circular with a diameter ~10 
arc minutes. As the quality of square pore MCP production improves the angular 
resolution is expected to get better, although eventually the performance will be 


mm 


Fig. 14. The PSF of a wide-field module, F' = 300mm, energy 1keV. The inner dashed square 
indicates the off-axis angle position at which the cross-arms go to zero, 6 = 2d/L. The outer 
dashed lines indicate the position of the shadow cast by the support ribs between the MCPs. The 
projection to the right shows the central peak and inner cross-arms. 
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limited to ~1 arc minute by a combination of spherical aberration, pore size and 
diffraction. 

The fully integrated optic for the module illustrated in Fig. 13, including the 
MCPs and support frame, is ~2.5kg, and the total mass of the module including 
the optic bench/tube and detector is ~15kg, depending on the type of detector 
system used. Compared with Wolter-I geometry grazing-incidence systems with a 
similar angular resolution the lobster eye optics are very low mass. This is because 
the membranes that support the grazing-incidence reflecting surfaces (the walls of 
the pores) are thin compared to the pore size. The open fraction of the square pore 
MCPs is large, ~65%, but the plates are self-supporting. The extra mass required 
for the support frame is a dominant fraction of the mass of the integrated optic. 


9. A Narrow-field Lobster Eye X-ray Telescope 


The lobster eye geometry can also be used to construct a narrow-field grazing- 
incidence X-ray telescope with an f-ratio similar to the conventional Wolter-I 
designs used for the Chandra Observatory, XMM-Newton, the Swift XRT etc. 
A typical narrow-field lobster eye module design is shown in Fig. 15, The square 
pore MCPs used can be the same size as used in the wide-field design, but now we 
are trying to maximize the effective area over a small region of sky centered on the 
axis of the module. The optimum square pore packing scheme that can achieve this 
is the sunflower packing shown in Fig. 4, but constructing this at the individual 
pore level is not yet possible using the MCP manufacturing techniques currently 
employed. An approximation to such packing can be achieved if individual square 
pore MCPs are arranged in a sunflower tessellation,? but this is inefficient unless 
the aperture size and focal length are much larger, e.g. F > 3m. For F = 1m 
the simple square packing of square pore MCPs is most efficient. Because we are 
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Fig. 15. The narrow-field module: F = 1000 mm. The optimum L/d values for operation at 1 keV 
are shown for each plate. 
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trying to maximize the area on-axis, a constant value for the L/d of the plates is 
not optimum. Plates which are close to the center of the aperture should have a 
large L/d while those at larger radius have a smaller L/d such that L/d ~ 1/r. 
The optimum ratio for an energy of 1keV is L/d = 2.5F/r, where r is the radius 
from the center of the aperture to the center of the plate, and such optimum values 
are shown on the aperture schematic in Fig. 15. In the tessellation shown there is 
a central plate on-axis (r = 0). This central plate has the same L/d = 60 as the 
surrounding four plates, which are at a radius of ~40mm. There are five plates, 
each of open width 38mm, across the full aperture of width 200mm. With a focal 
length of 1000mm this gives an f-ratio of ~f/10, similar to a Wolter-I design 
optimized for ~1 keV. Keeping the same focal length and increasing the size of the 
aperture by including more plates around the edges does not increase the effective 
area at 1 keV on-axis, because the grazing angles on the outer plates are too large. 
Figure 16 shows the on-axis collecting area vs. energy and the area at 1keV vs. 
off-axis angle for the optimum design with the L/d values from Fig. 15, Using the 
optimum design increases the on-axis area by a factor of 1.4 compared with a design 
with fixed L/d = 50. The usable FOV of the narrow-field lobster eye module has 
a diameter of ~6 degrees, which is at least a factor of 3 larger than a Wolter-I 
telescope operating in the soft X-ray energy band. Note that if the mirror aperture 
has the minimum diameter to provide an f-ratio of ~f/7 then the on-axis area at 
1keV is determined by the focal length for both the Wolter-I and lobster eye. For 
F =1m this is ~30 cm? at 1 keV. If you add more mirrors to increase the aperture 
size the on-axis area at 1 keV does not increase. For Wolter-I the on-axis area at 
lower energies does increase but the FOV at 1 keV stays the same. For the lobster 
eye the extra mirrors around the edges of the aperture increase the FOV at 1keV. 
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Fig. 16. Left: the on-axis collecting area vs. energy of the optimum narrow-field lobster module 
with F = 1000 mm. Right: the vignetting function at 1 keV for the same design. The dotted curve 
shows the result if we use a constant L/d = 50. 
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The mass of the optic for the module shown in Fig. 15 is ~1.2kg and the total 
mass of the telescope module is ~2.5 kg, depending on the detector used. The lobster 
eye optic can be used to produce a very low mass narrow-field X-ray telescope with 
a moderate angular resolution of a few arc minutes and an FOV several degrees 
across. 


10. Future Prospects 


The concept of a lobster eye X-ray telescope was originally introduced in 1979,° but 
to date (2016) an X-ray telescope based on this principle has yet to be flown on 
a satellite. Square pore MCPs manufactured from glass provide the first practical 
implementation of the lobster eye geometry, and a fully operational lobster eye 
X-ray telescope utilizing square pore MCP optics has already survived a sounding- 
rocket test flight. It is anticipated that a satellite-borne lobster eye instrument will 
be operational in the next few years. The left-hand panel of Fig. 17 shows the 
breadboard model of a narrow-field lobster eye optic currently under construction 
and test at the University of Leicester. Four 40 x 40 mm? MCPs have been integrated 
onto the support frame ready for testing. The right-hand panel shows the measured 
X-ray PSF of a single square pore MCP. The central focus and characteristic cross- 
arms, including the expected zero at @ = 2d/L, are clearly identifiable. In this image 
the extent of the cross-arms is truncated by the edges of the plate. 

A wide-field module with F' = 300mm, using the currently available square pore 
MCPs and a CCD detector, has a sensitivity of ~1.3 x 10~° ergs cm~? sec! in the 
energy band 0.3-5 keV for an integration time of 10 sec. This sensitivity is source 
photon limited because the full width of the focused spot is 0.87mm (a circular 
beam with diameter 10 arc minutes) and the total background count rate, including 
the Galactic-sky background, the Cosmic-sky background and the particle-induced 


Fig. 17. Left: A breadboard model of a narrow-field lobster eye optic under construction. Right: 
The measured X-ray PSF of a single plate. 
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background, is very low, ~9.5 x 1074 counts/sec in the focused spot. For an inte- 
gration time of 104 sec. the sensitivity is ~7.5 x 107!” ergs cm~? sec~! (0.3-5 keV) 
and is background limited. A wide field lobster eye module of this size would detect 
~94% of the GRBs found by the Swift Burst Alert Telescope. The position accuracy 
that can be achieved when detecting X-ray transients depends on the shape and size 
of the PSF and the source count, $, and background count, B, detected in the beam. 
If the FWHM of the central spot is 4.5 arc minutes, as expected from the currently 
available square pore MCPs, and using a circular detection beam of diameter 10 arc 
minutes, the radius which contains 90% of the reconstructed positions is given by 
Roo = C(S + gB)'/?/S where C' = 250 arc seconds and g = 2.1. Bright transients 
like most GRBs yield S > 1000 counts in total and will typically have an associated 
error circle with Rog < 10 arc seconds. 

The angular resolution and sensitivity of both the wide-field and narrow-field 
lobster eye designs described above is limited by the manufacturing quality of the 
square pore MCPs. If the surface roughness of the pore walls can be reduced to 
os < 10 A rms the efficiency will be increased significantly, and if the slumping 
process to produce the spherical profile is improved, such that the pores are better 
aligned and pore shear errors are reduced, then the angular resolution will approach 
a limit of ~1 arc minute. 

The simple square packing scheme for the pores gives rise to the single reflection 
cross-arms in the PSF. If a packing similar to the waffle packing illustrated in Fig. 4 
can be implemented then the lobster eye X-ray telescope would have a circularly 
symmetric PSF. The reflections that give rise to the cross-arms are not eliminated 
by using such a packing scheme but this flux is distributed in a featureless profile 
around the central peak of the PSF as illustrated in Fig. 6. The imaging by such a 
device would be free from artifacts and structure and as a consequence there would 
be less source confusion. 
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Full-shell grazing-incidence optics have been used extensively in X-ray astronomy 
and have produced the best angular resolutions to date, albeit at the expense 
of mass and cost. Lightweight full-shell optics trade angular resolution for ease 
of fabrication, lower mass, and lower costs per mirror shell. Current programs 
in lightweight full-shell grazing-incidence optics for X-ray astronomy and other 
applications are described, together with research to improve the native resolution 
of thin-shell optics or to effect post-fabrication improvements. Also covered are 
techniques for mirror-shell alignment and assembly, intended to reduce image 
degradation during production of mirror assemblies. 


1. Introduction 


Starting in the 1960s, the development of focusing grazing-incidence optics revolu- 
tionized the then emerging field of X-ray astronomy.’ ° Not only does the focusing 
X-ray telescope enable true imaging of resolved celestial X-ray sources, it enhances 
background-limited sensitivity to unresolved sources by several orders of magnitude 
and greatly suppresses source confusion. 

The predominant and most successful optical configuration for focusing X-ray 
telescopes is the generalized Wolter-I design.* This design employs grazing-incidence 
reflection from twc sequential (“primary” and “secondary” ) inner surfaces of revo- 
lution, which comprise a mirror shell. For the strict Wolter-I telescope design,* the 
optical prescriptions for the primary and secondary mirror surfaces are paraboloid 


*See Chapter 1 of this volume for more details on Wolter-I telescope optics. 
bFor grazing incidence, true imaging requires an even number of reflections. 
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and hyperboloid, respectively.° More generally, Wolter-I-like telescope designs may 
utilize slightly different prescriptions for the primary and secondary mirrors — for 
example, to improve wide-field imaging, as with Wolter—Schwarzschild® or polyno- 
mial®* prescriptions.¢ 

As fabrication, alignment, assembly, metrology, and testing techniques are rel- 
atively insensitive to the detailed optical prescriptions of the reflecting surfaces, 
the discussion here applies to all Wolter-I-like designs. Compared to alternative 
geometries, the Wolter-I-like design is more compact axially and accommodates 
a high degree of nesting. Nesting — coaxially aligned packing of mirror shells of 
progressively smaller diameters — is, of course, necessary in order to achieve the 
large aperture collecting areas needed for X-ray astronomy. 

The two general approaches for implementing a Wolter-I-like design are (a) to 
utilize actual full-shell mirrors or (b) to synthesize shells using azimuthally seg- 
mented mirrors. The full-shell approach first aligns each full secondary to its corre- 
sponding full primary mirror or, preferably, fabricates the primary and secondary 
surfaces on a monolithic full-shell mirror. This technique then co-aligns individ- 
ual full-shell mirrors of differing radii into a coaxially nested configuration. The 
segmented-shell approach typically utilizes mirror modules comprising an azimuthal 
sector of a nested Wolter-I-like design. This technique then aligns each secondary 
segmented mirror to its corresponding primary such that all segmented mirror pairs 
in the module share a common focus. Finally, the segmented approach tradition- 
ally co-aligns all the mirror modules such that they synthesize a nested full-shell 
configuration. An alternative to this “wedge” approach is the recently proposed? 
“meta-shell” approach,!°:° which more closely follows the integration technique used 
for full-shell mirror assembly. 

The principal advantage of the segmented-shell methodology is that it is mod- 
ular and scalable. In order to increase the X-ray telescope’s collecting area, one 
adds more, similarly sized mirror segments of the same focal length to the mirror 
assembly. The principal advantage of the full-shell methodology is that the tightest 
alignment tolerances are addressed in mirror fabrication rather than in alignment 
and assembly, significantly simplifying these latter processes. A second advantage 
is that full-shell mirrors are inherently stiffer and less susceptible to stress-induced 
edge effects than are segmented mirrors. For either approach, mount-induced dis- 
tortions become a major concern for the lightweight mirrors needed to achieve the 
large collecting areas desired for future X-ray telescopes. 

This chapter addresses lightweight full-shell X-ray optics for Wolter-I-like tele- 
scopes. Section 2 reports on techniques for fabricating lightweight full-shell X-ray 
mirrors. Recognizing that lightweight fabricated shells may lack the precise figure 


©The (strict) Wolter-I and Wolter-Schwarzschild prescriptions result in aberration-free on-axis 
imaging. 

See Chapter 1 of this volume for more details. 

°See Chapter 3 of this volume for more details on meta-shell optics. 
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needed to achieve the desired angular resolution, Sec. 3 describes a few methods 
for post-fabrication figure correction. Section 4 then briefly discusses alignment and 
assembly issues. Finally, Sec. 5 summarizes this chapter. 


2. Full-Shell Mirror Fabrication 


Effective collecting area and angular resolution — typically expressed as half-energy 
width (HEW) or equivalently half-power diameter (HPD)! — are the key perfor- 
mance metrics for an X-ray telescope. Owing primarily to mass and cost constraints 
for a space-borne telescope, large area and fine angular resolution are competing 
goals: large area requires lightweight and thin mirrors, which consequently lack the 
stiffness to resist distortion during mirror fabrication and assembly. Indeed all past 
and present high-resolution (HEW < 10”) X-ray telescopes have utilized at most a 
few — either 1 for solar or 4 for astronomical — thick-walled full-shell mirror pairs, 
precision fabricated from Zerodur™ or quartz blanks using traditional grinding, 
lapping, and polishing with excellent metrology.'' The pinnacle of this approach is 
the Chandra X-ray Observatory, which provides sub-arcsecond (HEW ~ 0.5’) imag- 
ing using four approximately 2-cm-thick (separate primary and secondary) mirror 
pairs. 

Fabrication processes for lightweight full-shell X-ray mirrors utilize other tech- 
nologies in order to produce X-ray telescopes with substantially lower mass and 
cost per unit area. This section describes those technologies, which typically rely 
upon replication techniques — electroforming (Sec. 2.1), plasma spraying (Sec. 2.2), 
and epoxy replication (Sec. 2.3). Replication, which seeks to copy the figure of a 
precision mandrel onto the surface of a complementary mirror, has two advantages 
over direct fabrication. First, the mandrel can be thick-walled and thus relatively 
insensitive to distortion during figuring and polishing. Second, replication itself is 
usually an inexpensive process compared to figuring and polishing, so it is much 
more cost-effective to use replicated mirrors if the design calls for several mirrors 
of the same size and shape. A disadvantage of replication is that the replica — 
especially if it is very thin — does not conform exactly to the shape of the mandrel. 
For this reason, technologies for direct fabrication of thin full-shell X-ray mirrors 
(Sec. 2.4) are also under development. 


2.1. Electroforming 


Electroformed-nickel replication (ENR) has been used extensively to produce 
X-ray optics. In this process, mirror shells are electroformed onto figured and super- 
polished mandrels from which they are later released by differential thermal con- 
traction. Figure 1 sketches the basic steps for mandrel production and for shell 
fabrication. 


fThe HEW (or HPD) is the diameter containing half of the reflected energy for a source at infinity. 
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Fig. 1. Electroformed Nickel Replication — process of mandrel and shell fabrication. 


2.1.1. ENR Process 


The mandrel is typically fabricated from an aluminum bar that is figured to nearly 
its final shape and then coated with amorphous electroless nickel — a chemically 
deposited nickel—phosphorous (Ni-P) alloy with high hardness and durability. The 
mandrel is next precision figured (typically by grinding or by diamond turning) 
and mechanically polished to a few tenths of a nanometer microroughness. In most 
cases, the mandrel is machined and figured to include both surfaces (primary and 
secondary), which simplifies later alignment and bonding of mirror shells into an 
assembly. The final step in preparing the mandrel for electroforming is to treat its 
surface in order to optimize the adhesion of the electroformed shell to the mandrel: 
if the adhesion is too low, the shell detaches from the mandrel during plating; if 
too high, the formed shell cannot be removed from the mandrel. One method is 
to chemically treat the Ni-P surface (e.g. oxidizing it); another is to deposit a 
loosely-adhered coating (e.g. evaporated gold) onto the surface. 

The prepared mandrel is immersed in an electroforming bath that contains the 
desired metal in solution and in anode baskets to replenish the solution. A potential 
is applied between the mandrel and the anodes to grow the mirror shell, typically at 
arate of 10-20 wm/hr. Usually the mandrel is rotated to produce a uniform deposit. 

After the desired shell thickness is reached, the assembly is removed from the 
electroforming tank, rinsed, and placed in a low-temperature (e.g. chilled-water) 
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Fig. 2. Infrastructure at Marshall Space Flight Center for electroformed nickel replication. At 
left, a Moore diamond-turning machine; center, a custom-built polishing machine holding a 0.5-m 
diameter mandrel; right, an electroforming bath, with a small-diameter mandrel about to be 
plated. 


bath for release. The mirror shell contracts less than the aluminum mandrel, thus 
sliding off when its hoop stress overcomes adhesion. Figure 2 shows representative 
hardware utilized in production of electroformed nickel mirrors. 

Since the initial use of electroformed-nickel mirrors for X-ray telescopes, there 
have been several process refinements. One was development of a plated nickel- 
cobalt alloy, which has a much higher precision elastic limit than that of the original 
pure-nickel plating: This alloy allows production of thinner (hence, lighter) shells 
that remain less susceptible to plastic deformation during manufacture or han- 
dling.!?13 Another was fine tuning the plating-bath chemistry, rates, and (electric- 
field-controlling) shield geometry to ensure low and nearly uniform plating stress 
along the shell length: This optimization reduces plating-stress-induced figure dis- 
tortions.'4+ An additional refinement is the use of hard ceramic coatings on mandrels: 
Such coatings serve as a durable, semi-permanent release layer that permits repli- 
cation of multilayers from the mandrel surface.!> ‘6 


2.1.2. Missions with ENR Mirrors 


Several high-energy astrophysics satellites and sub-orbital (rocket or balloon) 
experiments have employed X-ray telescopes using ENR. grazing-incidence mirror 
assemblies. This section gives a brief overview of ENR X-ray optics on previous or 
operating missions. It then describes in more detail the X-ray mirror assemblies 
recently produced for impending satellite missions, expected to launch before the 
end of 2017. 

Table 1 lists the properties of ENR mirror assemblies for previous missions, as 
well as for those currently awaiting launch. Note that the shell length is the sum 
of the lengths of the primary and secondary mirrors (equal for typical designs), as 
each mirror pair is conventionally formed as a monolithic shell, which facilitates 
alignment and assembly. 

Much of the original development of electroformed nickel replication for 
grazing-incidence mirrors was performed by the Osservatorio Astronomico di Brera 
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Table 1. Properties of ENR mirror assemblies for X-ray telescopes. 


Number — Shells/ Focal Shell Diameter Thickness 
Launch Modules Module Length Length Range Range HEW 
Mission Year (#) (#) (mm) (mm) (mm) (mm) (4) 
BeppoSAX 1996 4 30 1850 300 68-162 0.2-0.4 60 
LECS + 
MECS 
Swift XRT 2004 1 12 3500 600 187-293  0.65-1.0 16 
(JET-X) 
XMM- 1999 3 58 7500 600 306-700 0.47-1.07 = 15 
Newton 
HERO-1 2001 2 3 3000 610 40-48 0.25 45 
balloon) 
HERO/ 2013 8 14 6000 610 50-94 0.25 26 
HEROES 
balloon) 
FOXSI-1 2012 7 7 2000 600 76-103 0.25 27 
rocket) 
FOXSI-2 2014 v4 7(5),10(2) 2000 600 63-103 0.25 27 
rocket) 
SRG 2019 7 54 1600 300 80-356 0.2-0.6 16 
eROSITA 


SRG ART 2019 7 28 2700 580 50-150 = 0.25-0.33 25 


(INAF-OAB). Brera played key roles in developing X-ray mirrors for the BeppoS'AX, 
Swift, and XMM-Newton satellite missions. 

A mission of the Italian Space Agency (ASI), BeppoSAX operated for 6 years 
and carried several instruments that collectively spanned the 0.1-300 keV band.!” !8 
The instrument complement included one Low-Energy Concentrator Spectrometer 
(LECS) and three Medium-Energy Concentrator Spectrometers (MECS). Each had 
mirror assemblies of the same design,!? using nested electroformed nickel shells.?° 
The angular resolution of these mirror assemblies was about 1 arcminute HEW,?! 
due largely to use of a double-cone approximation to a Wolter-I design for the mirror 
prescriptions. 

A primary objective of NASA’s Swift mission?? is to detect gamma-ray bursts 
(GRBs) with its (coded-mask) Burst Alert Telescope (BAT) and to slew rapidly to 
the BAT-determined error circle in order to detect and to position accurately the 
GRB source using its X-Ray Telescope (XRT).?°°2° The mirror assembly for the 
Swift XRT is the spare from the Joint European X-ray Telescope (JET-X), which 
was designed,?®:?” fabricated, and tested?*-?° for a USSR mission that terminated 
prior to final integration and launch. The Swift XRT (JET-X) mirror assembly 
uses nested Wolter-I ENR mirrors to achieve an angular resolution of about 16” 
HEW.20; 31 

XMM-Newton*?:*° is ESA’s flagship mission for X-ray astronomy. It employs 
three large mirror assemblies containing 58 nested Wolter-I ENR mirror shells each, 
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achieving an angular resolution of about 15” HEW.°*°° The development of the 
XMM-Newton X-ray mirror assemblies was an impressive achievement, with many 
lessons learned®° that could benefit future missions. 

In the US, MSFC has developed and fabricated ENR mirrors?’ °° primarily 
for sub-orbital experiments to image high-energy (+30-100 keV) or medium-energy 
(10-30 keV) X-radiation. (To maintain near-total external reflection at higher 
energies, mirror graze angles are shallower than those for soft-X-ray mirrors.) In 
2001, the High-Energy Replicated Optics (HERO) demonstration balloon flight 
obtained the first focused hard-X-ray images of cosmic sources.*?:4! Subsequently, 
the HERO X-ray mirror system evolved, becoming the foundation for the High- 
Energy Replicated Optics to Explore the Sun (HEROES), a joint MSFC-GSFC 
balloon program to image the sun during the day and cosmic sources at night, in a 
high-energy (20-70 keV) X-ray band.*? #4 

MSFC also fabricated the X-ray mirror assemblies for the Focusing Optics 
X-ray Solar Imager (FOXSI) rocket program, led by the University of California 
in Berkeley.*° In contrast with the HERO/HEROES nested double-cone mirrors, 
the FOXSI mirrors use a Wolter-I prescription with somewhat less shallow graze 
angles, for a slightly softer response. Thus far, the FOXSI program*® has had two 
successful rocket flights*” 4% to observe solar active regions and microflares, in a 
medium-energy (4-15 keV) X-ray band.*° 

Spectrum-Roéntgen-Gamma (SRG) is a Russian-German astrophysical mis- 
sion designed to carry out an all-sky survey over the soft to medium energy 
X-ray range. Launched in July 2019, SRG will perform an all-sky survey for 
the first 4 years of its mission life, with the remaining three years allocated for 
pointed follow-on observations. On board SRG are two instruments — the extended 
Rontgen Survey with an Imaging Telescope Array (CROSITA) and the Astronomical 
Rontgen Telescope (ART-XC). Each features electroformed-nickel-replicated X-ray 
optics. 

eROSITA,” a low-energy instrument operating over 0.5-10keV, has seven 
mirror modules, each with 54 nickel mirror shells ranging in diameter from 80 mm 
to 356mm. The mirror shells are significantly thinner than those for XMM-Newton: 
only 0.2mm for the inner shells up to 0.6mm for the outer. Despite this, very good 
imaging performance was achieved, with an average HEW of 16” at 8 keV over all of 
the flight modules. All seven mirror modules are co-aligned (see Fig. 3, left panel), 
giving a total on-axis effective area of over 2000cm? at 1keV. The system focal 
length is 1.6m.°!:°? 

The eROSITA instrument was derived from an earlier, small-satellite payload 
called ABRIXAS,°? which had electroformed optics and for which 27 mandrels 
already existed. To adapt for eROSITA a further 27 outer mandrels were fabricated 
to significantly enhance the effective area at low energies. The additional mandrels 
were single-point diamond turned to a figure of approximately 10” HEW, then 
polished using a Zeeko robotic machine (see also Sec. 2.4) to achieve < 0.3nm 
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Fig. 3. eROSITA. Left panel shows a photo of the seven-module mirror assembly; Right panel 
shows a partially filled (31 of 54 shells) mirror module. (Photos courtesy of Dr. Vadim Burwitz, 
MPE; used by permission.) 


surface finish and HEW < 6”.°4 For shell replication, gold was first deposited on the 
mandrel surface, and a pure nickel shell electroformed on top of it. The gold loosely 
adheres to the clean nickel phosphorous mandrel surface, providing the necessary 
release layer without shell distortion. An optical monitoring system was used for 
mirror shell integration (see Sec. 4) to ensure that each mirror’s native shape was 
preserved during the bonding of the mirror to the supporting spider. Figure 3 (right 
panel) shows a partially assembled eROSITA mirror module. 

In certain Wolter-I optical designs it is possible for radiation from outside the 
field of view to reach the focal plane via a single reflection, typically from the 
hyperbola, effectively increasing the detector background. To suppress this it is 
necessary to add a series of baffles. For eROSITA a baffle system was constructed 
from 54 concentric Invar cylinders mounted on the spider wheel and aligned with 
each mirror shell, avoiding additional blockage.°® The use of this baffle, or pre- 
collimator, reduces single reflections hitting the focal plane by about 90% with no 
on-axis area impact, and only a small impact on off-axis area. 

ART-XC is a medium energy instrument operating over the range 
5-30 keV.°°°" As for ROSITA, ART-XC consists of seven co-aligned mirror mod- 
ules each containing mirror shells fabricated via the electroformed nickel replication 
process. The shells in this case are fabricated from a nickel/cobalt alloy that is 
stronger than pure nickel, range in size from ~50mm to 150mm diameter and in 
thickness from 0.25 to 0.33 mm, and are coated on the inside with ~10 nm of iridium. 
There are 28 concentrically nested shells in each module (see Fig. 4) and their focal 
length is 2.7m. 
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Fig. 4. ART-XC. A completed 28-shell ART-XC mirror module. 


The fabrication process for the ART-XC optics differs slightly from that of 
eROSITA, apart from the use of a nickel/cobalt alloy, in that it utilizes a chemical 
passivation, in this case via a potassium dichromate solution, as a release layer to 
allow the electroformed shell to separate from the mandrel. The shell is then coated 
on the inside with a small amount of iridium to boost high-energy reflectivity. As 
with eROSITA an alignment system was used to position each shell before gluing 
(see Sec. 4). 

The effective area for ART-XC is approximately 65cm? per module at 8keV 
for an on-axis effective instrument area of 455cm?. eROSITA and ART-XC have 
approximately the same effective area at 5keV. The angular resolution of the ART- 
XC optics modules is approximately 25” HEW on axis at 8 keV.°° However, because 
SRG is a survey missions that operates in a scanning mode for the first 4 years, it is 
advantageous to improve the angular resolution off axis, where sources spend most 
of their time. One way of flattening the angular resolution response across the field 
of view is to defocus slightly the telescope, degrading the resolution on axis while 
improving it further out. For ART, the optics will be defocused by 7mm along the 
optical axis. This degrades the on-axis resolution to approximately 30’ HEW. 

The Micro Réntgen Satellite Instrument (wROSIJ), is the first X-ray 
telescope to be flown aboard an “amateur” micro satellite.°? Designed to perform 
an all-sky survey in the soft-X-ray band, wROSI contains 12 nested gold-coated 
electroformed-nickel mirror shells ranging in diameter from 48mm to 80mm with 
a focal length of just 0.25m.° To simplify the design, a conic approximation to 
a Wolter-I geometry is used, wherein the mirror shell is figured as two straight 
cones rather than a parabola and hyperbola. This degrades the angular resolution 
(to about 20 arcminutes), but as the telescope functions as a light collector only 


116 B. D. Ramsey & S. L. O’Dell 


(the focal-plane detector is non-imaging), this has zero impact. The microsatellite 
(Maz- Valier) was launched in 2017. 


2.1.3. Laboratory Applications 


The electroformed-nickel-replication process lends itself to the fabrication of very- 
small-diameter optics, and this gives the possibility for spin-off applications outside 
of astrophysics. These include X-ray optics for radionuclide imaging and for plasma 
diagnostics, as well as for use as low-energy-neutron beam control and focusing 
elements. 

For medical use, biologically active molecules of interest can be labeled with 
a radionuclide and injected for in-vivo assessment. Small-diameter (34-mm) X-ray 
optics were developed®! to image the emission from the short-lived isotopes 9°"Tc 
(18 keV) and !°I (27keV) in mice and other small animals. A Wolter-I-type geom- 
etry with a combination of ellipse and hyperbola gave a working distance of about 
3m and a magnification of 4. A spatial resolution of 185 zm was reported. 

Low-energy neutrons, with energies in the cold to thermal region, have wave- 
lengths similar to X-rays and thus grazing-incidence optics work equally well with 
these particles. For a natural nickel surface, which has high neutron reflectivity, the 
critical angle for total external reflection is about 17 mradian/nm, so an optic with a 
graze angle of 0.5 degree (~9 mrad) will efficiently reflect neutrons with wavelengths 
longer than about 0.5nm. This was demonstrated®? using a 1/10-scale Chandra 
X-ray Observatory inner mirror shell (62-mm diameter and 1-m focal length) to 
focus cold neutrons over the wavelength range 0.5—-2nm. Follow-on publications 
detailed the benefits of such optics for flux enhancement in small-angle neutron- 
scattering experiments,®* and their potential for use in a neutron microscope.®* In 
the latter, a nested array of three confocal ellipsoid and hyperboloid mirror pairs 
(Fig. 5), was used to demonstrate a spatial resolution of 75 um and a magnifica- 
tion of 4. 


2.2. Plasma Spray 


The standard direct replication process involving nickel has seen widespread use for 
X-ray optics. One drawback is nickel’s relatively high density (8.9¢/cm?), which 
necessitates thin-walled mirrors to keep mass budgets manageable. As the stiffness 
of a mirror shell scales with the thickness cubed, this presents challenges for handling 
and mounting. 

Attempts have been made to use lighter and stiffer materials for full-shell mir- 
ror fabrication. Alumina, for example, has a density of just 3.4g/cm° and a net 
stiffness an order of magnitude greater for the same mass, even though its modulus 
of elasticity is only 60% that of nickel.® 
via a thermal-spray process. 


This ceramic and others can be applied 


Thermal spray involves the deposition of molten material onto a suitable sub- 
strate. In one variation, plasma heats ceramic particles to several thousand degrees 
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Fig. 5. Test neutron optic module with three nested shells. 


Celsius, at which point they are molten, and transports them to the substrate to 
be coated. The particles, typically 10-40 microns in size, coalesce on the substrate’s 
surface to form a continuous coating. 

In the past, several groups have attempted direct fabrication of ceramic shells 
onto suitable mandrels. A plasma spray was investigated for future use as a 
directly deposited shell with a thin buffer layer to prevent print-through of the 
porous ceramic onto the reflecting surface.®° Later,®’ a laminate approach was 
used, wherein a thin layer of electroformed nickel was first deposited on the mandrel, 
followed by a stiffening layer of plasma-sprayed ceramic and then a final thin layer 
of electroformed nickel. The ceramic of choice was mullite, an aluminum silicate, 
in the form of small hollow micron-sized spheres coated with metal. Under plasma 
deposition, the melted metal sticks the spheres together, forming a metal ceramic 
composite layer that has significant strength. Also considered were hollow spheres 
containing a layer of silicon carbide. Finally, other researchers investigated a wide 
variety of plasma-sprayed materials for free-standing optical components — includ- 
ing alumina, mullite, steatite, silicon and AlSi alloys.® 

Currently at least one group is pursuing the thermal-spray approach for light- 
weight X-ray optics. These researchers utilize advances in stress control, pro- 
cess diagnostics, and temperature monitoring to fabricate laminate mirror shells 
consisting of an inner layer of nickel/cobalt and a stiffening layer of alumina. The 
use of in-situ stress monitoring permits a claimed near neutral-stress deposit for 
improved figure control, and the use of thermal imaging permits control of the 
mandrel temperature to reduce distortions introduced by differences in the thermal 
expansion coefficient between the aluminum mandrel, the nickel/cobalt coating, and 
the alumina backing material. Test mirror shells made in this program are shown 
in Fig. 6. 
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Fig. 6. Alumina-coated mirror shells made by thermal spray. 


2.3. Epoxy Replication 


The fabrication of full-shell mirrors using epoxy replication typically requires two 
mandrels for each monolithic mirror shell (or twice that for separate primary and 
secondary mirrors). A precision mandrel provides a polished and figured surface for 
replicating the optical surface of a full-shell X-ray mirror. A cruder mandrel provides 
an unpolished, near-net-shape form for generating a carrier with the approximate 
(slightly oversized) dimensions and cone angles of the desired shell. Subsequent 
to depositing an optical coating (e.g. gold) over the outer surface of the precision 
mandrel, the process applies epoxy over the gold and/or inner surface of the carrier 
and carefully positions the carrier around the precision mandrel, leaving a thin 
epoxy-filled gap between the two. After allowing the epoxy to cure, the final step is 
to remove the carrier with its bonded optical surface from the precision mandrel. 

One advantage of epoxy replication is that there are many candidate materials 
for the carrier, which may be selected to optimize a combination of density, elastic 
modulus, and strength. A second advantage is that once the carrier has been fab- 
ricated — perhaps at high temperature or pressure — and possibly annealed, it is 
likely to be relatively dimensionally stable throughout subsequent processing. Dis- 
advantages are that epoxy shrinkage and differential thermal expansion may distort 
the mirror with age, and that the surface texture of the carrier may print through 
onto the optical surface of the replicated mirror. As the former effect is worse for 
a thick epoxy layer and the latter worse for a thin one, optimization of the surface 
requires careful control of the gap distance between the replication mandrel and 
carrier. 

The ESA mission EXOSAT (launched 1983) carried two identical low-energy 
X-ray telescopes, each mirror assembly containing two nested Wolter-I X-ray mir- 
ror pairs fabricated by epoxy replication onto beryllium carriers.”! Each mirror 
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assembly provided approximately 90-cm? geometric aperture with 18” HEW angu- 
lar resolution.’? The manufacturing process” utilizes three separate procedures: 
precision production of the replication mandrels, machining of the carrier, and epoxy 
replication. The replication mandrels were precision machined and polished Schott 
BK7 borosilicate glass. The carriers were machined of isopressed beryllium HP21 
to 3.5-mm wall thickness, heat treated to relieve residual internal stress, then final- 
figured to within 2 wm of the prescribed surface shape using a precision lathe. The 
actual epoxy replication required positioning the carrier with minimal distortion 
around the gold-coated replication mandrel, injecting epoxy into the gap between 
the mandrel and the carrier, minimizing stresses on the carrier while the epoxy 
cured, and separating the carrier from the mandrel using thermal shock and the 
differing expansion coefficients and heat capacities of glass and beryllium. 

Prior to the adoption of electroformed-nickel replication for the XMM mir- 
rors, ESA funded research toward developing thin-shell grazing-incidence mirrors 
using epoxy replication onto carbon-fiber-reinforced plastic (CFRP) carriers.“4 The 
goal was to produce precision thin-walled shells (0.6-1.2mm) of the dimensions 
ultimately used for the XMM flight mirror shells (Table 1). The mirror fabrica- 
tion required two full-length (primary + secondary) mandrels for each shell size: 
(1) a near-net-shape double-cone mandrel for laying up the CFRP carrier and (2) a 
Wolter-I precision figured and superpolished mandrel for epoxy replication of a gold- 
coated surface onto the carrier to generate the mirror shell. The carrier mandrel was 
metal (steel or electroless-nickel-coated aluminum); the precision epoxy-replication 
mandrel was electroless-nickel-coated aluminum, which accommodates either epoxy 
or electroformed-nickel replication and provides a large CTE mismatch to facilitate 
separation of the shell from the mandrel after replication.” 

The CFRP carrier was laid up as a laminate stack of unidirectional wound 
carbon fiber in an epoxy matrix, with plies having different carbon-fiber orienta- 
tions.’° After curing and separation from the carrier mandrel, the carrier is precisely 
lowered over the gold-coated precision mandrel for epoxy replication of the surface, 
curing, and separation to produce the precision figured mirror shell. Unfortunately, 
dimensional instability due to temporal variations in moisture content of the CFRP 
resulted in print-through and large-scale deformation.” 78 


2.4. Direct Fabrication 


The high-resolution mirror assembly of the Chandra X-ray Observatory demon- 
strated that modern optical fabrication techniques can produce sub-arcsecond- 
resolution grazing-incidence optics if thick full-shell mirror substrates are used.” 
The challenge for this direct fabrication approach is to transition to much thinner 
shells that would satisfy the effective-area and angular-resolution needs of future 
X-ray astronomy missions. To meet these demands, the technology must be pushed 
to fabricate shells that are an order of magnitude lighter than Chandra’s. Satisfy- 
ing these goals requires: selection of appropriate lightweight, stiff mirror substrate 
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materials; designing appropriate optics fabrication support structures to conserve 
the integrity and the optical performance of the mirror shells during the manu- 
facturing (and metrology) processes; and the development of figuring and polishing 
techniques for rapid convergence to the desired figure and finish without introducing 
unwanted mid-spatial-frequency surface figure errors. The latter (rapid convergence) 
is also key to controlling the overall costs of flight mirror production. Currently, such 
direct fabrication technology is being developed for lightweight grazing-incidence 
optics by at least two institutions — the Osservatorio Astronomico di Brera (INAF- 
OAB) and lately at Marshall Space Flight Center (NASA MSFC). 

Direct fabrication can potentially be implemented with a wide variety of mate- 
rials. An ideal substrate for the mirror shell would have low density, low coefficient 
of expansion (CTE), high modulus of elasticity and high yield strength. It should 
also be commercially available in the form of large-diameter tubes and be amenable 
to figuring and polishing. Some candidate materials are shown in Table 2. Beryllium 
and beryllium-aluminum coated with nickel/phosphorous have excellent mechanical 
properties for an optic substrate and can be figured and polished, but because these 
materials are quite expensive, direct fabrication technology development efforts are 
focused on fused silica®?:8! (OAB) and aluminum coated with nickel/phosphorous 
as a surrogate for beryllium®?:83 (MSFC). 

Thin-shell direct fabrication technology development has been underway for 
some time at OAB. In their process, the fused-silica shells are first coarsely ground 
on inside and outside surfaces to a double-cone profile with a shell thickness of 
few millimeters. After that, because the stresses due to the polishing tool pressure 
can be enough to distort or break the thin shell substrates,** the shell needs to be 
mounted into a substrate support structure in order to maintain the integrity of the 
shell during mirror fabrication and metrology. The prototype shell mirror installed 
into such a structure, consisting of a series of mechanical flexures to support it 
gently, is shown in Fig. 7. After coarse grinding, out-of-roundness errors of the 
inner shell surface are removed by using fine grinding. After this, the final axial 
profiles are figured and polished using computer-controlled small-tool polishing, 


Table 2. Mechanical properties of potential mirror substrate materials. 


Elastic Yield 
Density CTE Modulus Strength 

Material (g/cm?) (10-6 /K) (GPa) (MPa) 
Fused Silica 2.2 0.5 72 48* 
Beryllium 1.8 12 318 240 
Al (6061) md 24 69 276 
Alsi 2.8 17 90 235 
BeAI-162Met 2 il, 13.9 193 314 
Duralcan F3S.30S AlSi+SiC (30% vol) 2.8 14.6 120 210 


*Maximal achievable value. The “working” value is typically much less and depends upon the 
surface/subsurface condition. 
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Fig. 7. A 2-mm thick shell made from fused silica mounted into the substrate support structure, 
with a series of mechanical flexures around the circumference of the shell at each end. The diameter 
at the intersection plane is 487mm. (Photo courtesy of Osservatorio Astronomico di Brera, used 
by permission. ) 
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Fig. 8. Zeeko IRP600X computer-controlled polishing machine installed at the MSFC (left). The 
polishing arm of the machine placed inside a plastic shell (right). 


such as the polishing machine shown in Fig. 8, and finished off with ion-beam 
figuring.*° This computer-controlled deterministic polishing process leads to a high 
convergence rate. 

X-ray testing of a prototype shell has been performed at the Panter facility. 
The figure of the mirror shell was optimized for a Wide-Field X-ray Telescope®® 
using a polynomial prescription designed to give uniform angular resolution over 
a broad field of view.8’ The measured angular resolution of the shell, which was 
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still under development, was found to vary from 17” HEW on axis to 22” HEW at 
30 arcminutes off axis. 


3. Post-fabrication Correction 


The fabrication processes for thin-shell optics impart residual figure errors that 
degrade the ultimate imaging performance. For instance, in replication processes, 
stress imparted during electroforming or plasma spraying can cause the resulting 
shells to deviate from the mandrel figure (which in turn have figure errors). The 
thinner the shell, the greater these figure distortions will be. Likewise, for epoxy 
replication, glue shrinkage or carrier instability (in the case of composites) can 
cause limitations. For full-shell direct fabrication shell distortions from the mere 
application of the polishing pad may provide limitations. 

To address these residual figure errors, several techniques — static and active — 
are being developed.**:? Some of these aim to provide active figure control via 
piezoelectric”? or magnetic actuators®! that can apply a local strain to the mirror 
shell to compensate residual figure errors. These techniques are being investigated 
primarily for segmented X-ray optics correction, as the open form of these seg- 
ments makes figure control easier. While some active figure adjustment is feasible 
for full-shell grazing-incidence mirrors,” most of the research focuses on static figure 
correction: differential deposition (Sec. 3.1); ion-beam figuring (Sec. 3.2); and stress 
manipulation (Sec. 3.3). 


3.1. Differential Deposition 


Differential deposition utilizes physical vapor deposition to deposit material selec- 
tively onto the mirror surface to smooth out figure imperfections. The technique 
can in principal be used to correct a wide spatial range of figure errors, and can 
be applied to full-shell and segmented optics. It has heritage from the synchrotron- 
optics world where it has been used to achieve sub-y radian (~0.1”) slope errors.?? 

Figure 9 shows how differential deposition can be implemented. A sputtering 
target provides the necessary “filler” material and a mask positioned in front of 
the target defines a precise beam profile of an appropriate width to correct the 
dominant figure errors. Translating the mirror (or target + mask) at a controlled 
variable velocity applies a profile correction of the needed amplitude. 

In practice, metrology would first be performed on the shell to be corrected, 
and by comparing the desired and measured figures, an error map is then generated 
showing the profile of correcting material to be deposited. Knowing precise deposi- 
tion rates and beam profiles, computer simulations can then be used to derive the 
necessary velocity profile. 

It was shown that for nickel shells, a filler material of nickel gave the best 
overall performance in terms of good adhesion, high deposition rate, low surface 
roughness, and low stress.°4 The latter is of key importance, as stress in the 
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Fig. 9. Schematic of the differential-deposition figure-correction process. 


correcting deposit, if high enough, can provide additional figure distortions, negating 
the intended figure correction. These researchers also showed that a sequence of 
figure corrections was most efficient, starting with a relatively coarse mask to fill 
in broad, large-amplitude features and then transitioning to finer masks to correct 
higher frequencies (Fig. 10). In follow-on publications? °° it was shown that metrol- 
ogy uncertainty, when multiple correction stages are implemented, can be the most 
challenging factor in correcting thin-shell optics to the arcsecond level. 

Actual X-ray testing also indicates the promise of differential deposition for 
correcting full-shell X-ray optics. Figure 11 displays an intrafocal X-ray image pro- 
duced by an electroformed-nickel full-cylinder (two-reflection) shell before (left) 
or after (right) single-pass differential-deposition correction at selected azimuths. 
These images, obtained at MSFC’s 100-m X-ray test facility, exhibit a factor-of-two 
improvement in the HEW — from 17” to 8” — for a single pass over sputtered 
nickel through a 5-mm-wide slit.°” 


3.2. LIon-beam Figuring 


Ion-beam figuring is essentially the inverse process to differential deposition, in 
that mirror-surface material is selectively removed to improve the mirror’s figure. 
Typically, argon ions are accelerated under vacuum to impinge upon the mirror 
surface, removing material by transfer of kinetic energy. The beam profile and beam 
energy are controlled by electrodes, so that the beam can be tailored for the spatial 
wavelengths and amplitudes under correction.9% °° 

In practice a series of removal functions can be derived that give removal 
rate and beam shape for different electrode potentials. These can then be used 
in simulations to obtain the appropriate velocity “map” to correct the mirror sur- 
face (typically the power is kept fixed and the beam moved with varying velocity 
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Fig. 10. Simulated correction sequence showing parabolic axial figure profile before (top left) 
and after each of three stages of correction using a beam with FWHM of 14mm, 5.2mm, and 
1.7mm, respectively. The dotted line denotes the desired figure and the solid line, the predicted 


figure based upon the simulation. Overall resolution improved from 7.8” to 0.9" HEW (2-reflection 
equivalent). 
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Fig. 11. Intrafocal X-ray image before (left) or after (right) single-pass differential-deposition 
correction at selected azimuths. In the right image, the blue arc denotes the range of azimuths 
for the differential deposition; the red arc, the range over which the HEW is measured. For both 
quadrants measured, the angular resolution improves by a factor of two, to about 8’ HEW. NB: 
The triangular blank area at the bottom of the ring is the shadow of a post that supports a mask 
blocking direct (non-imaged) illumination of the detector. 
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to remove different amounts of material). As with differential deposition, broad 
features can be tackled first with wide beams that permit faster removal. The process 
can then transition to finer features as necessary. 

Ion-beam figuring is a well-developed tool that has been used in general preci- 
sion optics production for over 25 years, where it is often the final step in optical fab- 
rication. It has been used to figure X-ray optics for synchrotron applications,!0°: 1°! 
where slope errors were reduced by an order of magnitude to microradian levels. 
As a non-contact figuring method, it is well suited to applications involving thin 
substrates and is currently being used to provide final figuring of directly fabricated 


full-shell optics for future X-ray astronomy missions.®° 


3.3. Stress Manipulation 


Most X-ray mirrors utilize an optical coating on the figured substrate in order 
to enhance the X-ray reflectance. As mentioned in the description of differential 
deposition, any stress in the coating will distort a thin mirror due to a bimorph- 
like effect. Thus, obtaining low-stress coatings — by optimization of deposition 
parameters or by annealing — is an important objective in depositing films onto 
thin X-ray mirrors.1°?1°° Achieving low microroughness and good adhesion are 
also important objectives in depositing optical coatings. Of course, microroughness, 
adhesion, and coating stress all depend upon the deposition parameters and surface 
properties of the substrate. 

On the other hand, precisely controlled coating stress can be used to correct 
long-spatial-wavelength errors — e.g. cone angle and curvature — using a static 
bimorph-like effect.1°7-1°° These errors may be intrinsic to the fabricated substrate 
or introduced by other films!?°-!!! deposited onto the substrate. Correcting shorter 
spatial-frequency errors would require a precision translating slit, similar to that 
used for differential deposition (Sec. 3.1). 

Instead of utilizing the stress in a coating to change the figure of a thin sub- 
strate, controlled ion implantation into the substrate can produce the requisite 
stress. Rather than using ions to correct the surface through erosive ion-beam fig- 
uring (Sec. 3.2), differential ion implantation introduces compressive stress near a 
substrate’s surface. By controlling exposure of the surface to ions as a function of 
location,!!? differential induced stress can in principle correct small figure errors 
through a bimorph-like effect. 

The Massachusetts Institute of Technology is developing differential ion implan- 
tation at its Space Nanotechnology Laboratory, for figure correction of thin X-ray 
mirrors.!!?: 114 Thus far, this research has characterized the dependence of integrated 
stress upon ion energy and fluence, which is the relevant parameter for determining 
the amplitude of correction achievable at a given spatial wavelength, dependent 
upon the substrate’s thickness and elastic modulus. Importantly, the research has 
also shown that the ion implantation does not significantly degrade the surface 
microroughness. 
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A particularly interesting feature of ion implantation is that the resulting stress 
near the surface can be anisotropic if the ion beam is not normally incident upon 
the substrate. This offers the possibility of correcting the figure in one direction 


with minimal change of figure in the orthogonal direction.11° 


4. Alignment and Assembly 


Mounting-induced and gravity-induced distortions of thin mirrors are potentially 
severe at long spatial wavelengths, so the mounting and assembly scheme needs 
to prevent or correct low-spatial-frequency deviations. As mounting is unlikely to 
correct figure errors at mid and short spatial wavelengths, the mirrors should be 
inherently of adequate quality at least at mid and higher spatial frequencies. 

Typical schemes for mirror shell alignment and assembly strive to maintain 
the natural shape of a mirror during the assembly process. Small contact forces 
can substantially distort the ends of thin shells, degrading the angular resolution, 
and so the shell is typically held so that it “floats” in a groove into which a small 
amount of epoxy can be injected. This epoxy must be low shrinkage, with equal 
amounts on either side of the shell to prevent any distortions on curing. In some 
cases the shell can be staked with a small amount of epoxy at the bottom of the 
groove, which is then cured before the rest of the joint is filled. This reduces any 
epoxy-shrinkage-induced distortions. 

Many full-shell optics, such as those for eROSITA and ART (Sec. 3.2), are 
mounted at one end only. This avoids over-constraining the figure of the shell and 
leads to better imaging performance. The optimum position to hold a shell is near its 
intersection point,°® but for the nested configuration this is impractical, so typically 
the large end is used. 

To align each shell and position it for mounting, various schemes are used. One 
approach, which is used at the Marshall Space Flight Center, is to support the shells 
on three points that can be adjusted radially and in translation (Fig. 12). A system 
of displacement sensors is then used to monitor the concentricity and the shape of 
the shell. The mirror assembly is mounted on a computer-controlled rotary stage 
and the signals from the sensors, which are fixed and monitor each end and the 
intersection of the rotating shell, are used both to align the shell concentrically and 
to adjust the circularity of the shell to minimize overall slope errors. When the shell 
is aligned and its shape optimized, it is glued into the spider groove in which it is 
suspended, and the next shell is brought into place. 

A different approach, utilized for eEROSITA!® 1!” (and for XMM-Newton before 
it) suspends the shell from a series of wires attached at the exit (hyperbolic) end 
of the mirror (Fig. 13). This system permits control of the mirror shape by tuning 
the axial load distribution on each of 16 support wires. During the integration full- 
illumination optical metrology is used to monitor and optimize the performance of 
the mirror shell. At this point, the shell is then glued into the (16-spoke) spider. 
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Fig. 12. Alignment and assembly system used at NASA/Marshall Space Flight Center. Proximity 
sensors are used to monitor a rotating shell in multiple positions. Analysis of the resulting waveform 
permits optimization of shell performance through adjustment of the shell mounts. 


Fig. 13. eROSITA mirror shell suspended on adjustable-tension wires for integration into spider. 
Real-time optical metrology permits figure optimization. (Photo courtesy of Media Lario Tech- 
nologies, used by permission.) 
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For the future, a form of active mount could be envisioned. This could address 
issues with epoxy shrinkage, permitting the mirror to be periodically adjusted if 
necessary. Active control of X-ray mirrors is currently under investigation!!® for 
segmented optics, but such technology could also be applied to full-shell mounting 
schemes. 


5. Summary 


Full-shell X-ray mirrors have achieved the highest resolution to date (sub-arcsecond) 
for astronomical X-ray optics. However, this necessitates thick mirror structures 
and, hence, large mass per unit collecting area, as well as meticulous and expensive 
figuring and polishing. As an alternative, lighter weight mirror shells have been 
developed that trade resolution for ease of fabrication and greatly reduced mass 
and cost. Efforts are underway to refine these processes and improve the angular 
resolution. 

To date the electroformed-nickel-replication (ENR) process, in which mirror 
shells are replicated from figured and superpolished mandrels, is the most widely- 
used process for producing thin-walled full-shell optics for X-ray astronomy. Cur- 
rent programs include the eROSITA and ART-XC telescopes aboard the Russian 
Spectrum-Rontgen-Gamma mission. These make use of advances in electroforming 
to fabricate thin shells with moderate (15’—25”) angular resolution. Spin-off appli- 
cations of the ENR process include optics for medical imaging and cold-neutron 
beamlines. 

Other techniques for producing lightweight full-shell optics include plasma 
spray, which is a variation on the ENR technique but with a ceramic coating applied 
to a very thin electroformed layer, to provide a full-shell low-density rigid backing. 
Additional techniques include direct fabrication, wherein computer controlled pol- 
ishing machines are used directly to figure and polish very thin metal and ceramic 
mirrors, and epoxy replication, wherein a pre-made lightweight carrier is held in 
close proximity to a coated mandrel and epoxy is injected in between. 

Once mirror shells have been fabricated, techniques are under development 
to provide post-fabrication figure correction. These include differential deposition, 
wherein physical vapor deposition is used to selectively coat the active mirror surface 
to improve axial figure profile, and ion figuring, which is a similar process but uses 
ion beams to selectively remove material. Additional techniques also under devel- 
opment include stress manipulation in which selectively deposited stressed material 
or ion implantation is used to adjust a mirror’s figure. 

Mounting and alignment present a significant challenge for thin-shell optics and 
can be the largest source of image error. Techniques to date have concentrated on 
holding the mirror shell in a low-stress support and manipulating the shell while 
monitoring the figure. The shell floats in attachment grooves into which epoxy is 
injected to secure the shell in its optimum shape. Future refinements could include 
active mounts to compensate for figure-distorting epoxy shrinkage. 


Re 
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This chapter describes the use and development of reflective multilayer coatings 
for hard X-ray astronomy. These coatings are designed to have broad spectral 
response at X-ray energies above ~ 10 keV, and are deposited onto mirror sub- 
strates used to construct grazing-incidence X-ray telescopes. The principles of 
operation, methods for the design, fabrication, and characterization of these coat- 
ings, and future research directions are described. 


1. Introduction 


X-ray multilayer coatings that efficiently reflect short-wavelength radiation, devel- 
oped over the past three decades for a variety of scientific and technological appli- 
cations, have had a particularly significant impact on instrumentation for space 
science. Multilayer coatings that operate near normal incidence in the extreme 
ultraviolet (EUV) spectral region are of great importance for solar physics: having 
been used successfully on numerous sounding rocket and satellite instruments for 
both solar imaging and spectroscopy since the 1980s, these coatings by now consti- 
tute an essential, mission-enabling technology. Indeed, almost all EUV observations 
of the Sun are currently made with multilayer-coated optics (see, e.g. Refs. 1, 2, 
and references therein). Multilayer coatings that operate in the soft X-ray band 
(ie. EB < ~0.5keV) at the Brewster angle near 45° are also being developed for 
astronomical polarimetry (see, e.g. Refs. 3, 4). 

This chapter is focused on a different application of multilayer technology to 
space science, however: specifically, grazing-incidence multilayer coatings for hard 
X-ray astronomy. The multilayer coatings for this application typically have a broad 
energy response and work efficiently at small graze angles that are nevertheless sev- 
eral times larger than the critical angle for total external reflection from front-surface 
coatings such as gold or iridium. The performance attributes of such multilayer 
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coatings enable the construction of highly-nested Wolter-type X-ray telescopes 
having focal lengths that are sufficiently short for use in balloon and satellite 
instruments, and that operate up to much higher energies than would be possi- 
ble otherwise. For example, the NuSTAR X-ray telescopes were constructed using 
broadband X-ray multilayer coatings? deposited onto segmented, cylindrical, thin- 
glass substrates, have a focal length of 10 meters, and operate up to 79 keV, far 
beyond the 10 keV sensitivity limit of both the XMM-Newton and the Chandra 
telescopes. 

The sections that follow explain how X-ray multilayers operate, how they are 
designed, fabricated, and tested, the importance of stability and film stress, and 
some prospects for future advancements. 


2. Principle of Operation 


X-ray multilayer coatings use optical interference to efficiently reflect X-rays at graze 
angles that are several times larger than those used for single-layer, front-surface 
coatings, which rely on total external reflection for high efficiency.® Multilayers 
comprise a layer stack of optically dissimilar materials designed so that the small 
reflections that occur at each interface in the stack add coherently, in phase, over 
some range of graze angles and photon energies. 

A so-called “periodic” multilayer, illustrated conceptually in Fig. 1‘a), is a film 
stack containing a number of repeating, identical bilayers (i.e. a pair of layers of 
two different materials). Just as Bragg’s law describes the condition for construc- 
tive interference of X-rays in a crystal, the same law describes the condition for 
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Fig. 1. (a) Schematic diagram of a periodic multilayer. (b) Measured (symbols) and calculated 
(solid) reflectance of a periodic W/Si multilayer having N = 100 repetitions of bilayer that is 
4.2 nm thick, measured at graze angles in the range 0 = 0.1° to 6 = 0.5°. (See electronic edition 
for a color version of this figure.) 
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Fig. 2. (a) Schematic diagram of a non-periodic multilayer coating (depth-graded, in this case) 
containing a range of bilayer thicknesses. (b) Measured reflectance (red) of a NuSTAR depth- 
graded multilayer having N = 291 W/Si bilayers, along with a fit (green) to the measured data. 
(See electronic edition for a color version of this figure.) 


constructive interference in a periodic multilayer film (albeit with small, yet impor- 
tant corrections for refraction within the layers,’ which we can safely ignore for 
this discussion): nA = 2d sin@, where n = 1,2,... is the Bragg order, @ is the 
grazing-incidence angle, \ is the photon wavelength, and d is the bilayer thickness, 
or multilayer “period”. At a given incidence angle, a periodic multilayer film will 
reflect X-rays over a relatively narrow range of energies, i.e. those that satisfy the 
condition for constructive interference; this is illustrated in Fig. 1(b), which shows 
the measured and calculated reflectance of a periodic multilayer film containing 
N = 100 repetitions of a 4.2-nm-thick W/Si bilayer, measured at various graze 
angles below @ = 0.5°. (For clarity, only the n = 1 Bragg orders are shown.) 

Non-periodic multilayer coating designs are necessary to achieve broad spectral 
response. A non-periodic, broadband X-ray multilayer comprises a stack of bilayers 
having a range of periods. Several types of broadband multilayer coatings have 
been investigated over the years for hard X-ray applications, including designs 
containing multiple “blocks” of periodic bilayer stacks having specific periods,® 
so-called “depth-graded” designs in which the distribution of bilayer thicknesses 
is described analytically? (Fig 2(a), for example), and fully aperiodic structures 
that are designed numerically.!° The coatings used for the NuSTAR instrument are 
depth-graded designs, comprising either W/Si or Pt/C bilayers. Shown in Fig. 2(b) 
are the measured and calculated reflectance-vs.-energy curves of a depth-graded 
W/Si multilayer for NuSTAR designed to operate efficiently up to the W K-edge 
near 69.5keV at an angle of 0 = 0.15°; the Pt/C multilayers used on NuSTAR’s 
inner mirror shells operate with similarly high efficiency, albeit at smaller graze 
angles, up to the Pt K-edge near 78.4 keV. 
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3. Multilayer Design 


The X-ray performance of a multilayer coating is determined by the optical 
properties of the layer materials, the quality of the interfaces between layers, 
and the layer thicknesses. High reflectance is achieved with multilayer material 
combinations that have both large differences in their complex indices of refraction, 
so that the Fresnel reflection coefficient at each interface is maximal,!! and low 
absorption at the energies of interest, so that the incident radiation penetrates 
deeply into the layer stack and reflects from as many interfaces as possible. Addi- 
tionally, for maximum reflectance it is necessary to grow the multilayer stack such 
that all the layers are spatially uniform, continuous, and have the correct thickness, 
and all the interfaces in the stack are sufficiently sharp (i.e. chemically abrupt) and 
smooth: interfacial roughness results in non-specular scattering,!? while interfacial 
diffuseness at an interface (due, e.g. to chemical reaction, atomic diffusion, etc.) 
will result in increased transmission at that interface; in any case, both types of 
interface imperfections reduce the net reflectance of the coating, and therefore must 
be minimized. Furthermore, the high-energy cutoff of a non-periodic coating is con- 
strained by the minimum bilayer thickness that can be fabricated in practice for a 
given material pair, which is in turn limited by interfacial diffusion and roughness 
as well, and is highly dependent on both the materials properties of the constituents 
and on the specific deposition conditions used for film growth. While the identifi- 
cation is straightforward of material pairs that meet the optical requirements, an 
attribute that can be determined from published optical constants or atomic scatter- 
ing factors,’ the ability to actually grow nanometer-scale bilayers having sufficiently 
small periods, and sufficiently small interface imperfections, must be investigated 
experimentally. Multilayer material pairs that have been demonstrated thus far to 
work well in broadband, non-periodic structures for hard X-ray astronomy include 
a variety of tungsten-based structures such as W/Si and W/SiC, and platinum- 
based structures such as Pt/C and Pt/SiC. The search for new material pairs that 
will provide even better performance for future instruments is on-going (see, e.g. 
Refs. 13 and 14). 

The computation of reflectance for an ideal multilayer (i.e. one having per- 
fectly smooth, sharp interfaces) is a straightforward problem in electromagnetism 
that can be solved exactly, following various analytical methods;!° however, the 
introduction of realistic interface imperfections (which are always present to some 
extent in practice) greatly complicates the problem, and approximation methods 
must be used. The details underlying such methods are beyond the scope of this 
chapter; the interested reader is referred to Ref. 16. In any case, accurate multilayer 
reflectance can now be computed using widely available software, including the 
Lawrence Berkeley National Laboratory’s Center for X-ray Optics’ web-based pro- 
gram (http://henke.Ibl.gov/optical_constants/multi2.html), as well as IMD!” and 
other free or commercial software packages. Reference 18 discusses multilayer design 
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considerations and optimization techniques specific to astronomical hard X-ray 
telescopes, taking into account both the desired energy bandpass and the telescope 
field of view in order to maximize performance. 


4. Multilayer Fabrication 


X-ray multilayer coatings have been successfully fabricated using a variety of physi- 
cal vapor deposition techniques; however, magnetron sputtering! has been the most 
widely used technique thus far for the deposition of broadband multilayer coatings 
for X-ray astronomy. Magnetron sputtering can be highly stable and repeatable, can 
be easily scaled to coat large areas for mass production, and can produce (under 
the right deposition conditions) high-quality multilayer films comprising dense layers 
with few voids and small interface imperfections. 

The magnetron sputtering source consists of a solid target of the material to 
be deposited, fixed in place in front of an array of strong, permanent magnets, as 
shown conceptually in Fig. 3. The source is mounted in a vacuum chamber that 
is evacuated to a low base pressure (typically reaching the low 10~" Torr range or 
lower) and then back-filled with the chosen working gas (typically argon) which is 
generally maintained at a constant, precisely-controlled pressure. High voltage is 
applied to the target relative to the nearby grounded shielding, thereby creating a 
crossed electric and magnetic field arrangement that confines the resultant plasma”? 
in a region close to the target surface, thereby maximizing the deposition rate. 
Ionized working-gas atoms (e.g. Ar*) in the plasma are accelerated (by the voltage 
gradient) to the target surface, where they can collide with and eject target atoms; 
the ejected target atoms then migrate to the substrate surface where they condense 
to form the growing film. 

Real-time, in situ deposition-rate sensors, such as quartz crystal monitors, gen- 
erally do not provide sufficiently high precision for the deposition of nanometer-scale 


Fig. 3. Schematic diagram of a planar, rectangular magnetron sputtering source. (See electronic 
edition for a color version of this figure.) 
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layers with sub-A accuracy, as is required for good X-ray multilayer performance, 
and so closed-loop deposition-rate control is ordinarily not used for the growth of 
X-ray multilayer films. Instead, in magnetron sputtering precise control of both the 
source power and the working-gas pressure is used to ensure high deposition-rate 
stability, both during the coating run and from day to day. The power applied to 
the source is typically controlled with power supplies that are specially designed for 
sputtering, and can be operated in constant power, constant voltage, or constant 
current mode, as desired, and can automatically handle plasma stability problems 
due to, e.g. arcing. The working-gas flow rate is typically adjusted continuously dur- 
ing the run via closed-loop control so as to maintain constant working-gas pressure 
with high-precision. Working-gas pressure is typically measured using a capacitance 
manometer, while a mass flow controller is typically used to adjust gas flow rate. 
With sufficient stability and repeatability, the deposition rates of the materials 
being deposited can be calibrated a priori, by measuring the thickness of test films 
(typically using X-ray reflectometry, described below) that are deposited in advance. 

For the growth of two-component multilayer films, two magnetron sources are 
required. In one common approach, the two sources are fixed in place in the vacuum 
chamber while the substrate is moved past each source in turn, thus building up the 
multilayer stack one layer at a time. Cylindrical deposition geometries (e.g. Fig. 4) 
have been used for the mass-production of multilayer-coated thin-shell X-ray mirrors 
used to construct the astronomical X-ray telescopes that have been developed thus 
far (e.g. Refs. 5 and 13). In this geometry the two magnetron sources are positioned 
vertically, facing outward, while a number of thin-shell substrates are mounted on 
the inside of a rotating cylindrical platen, facing inward. The rotational velocity of 
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Fig. 4. Schematic diagram of a magnetron sputtering system using a cylindrical deposition geom- 
etry for mass-production of multilayer-coated, thin-shell cylindrical X-ray mirrors, as viewed from 
the top. (b) Isometric view of a 3D model of one of the sputtering systems used for production of 
some of the NuSTAR X-ray mirror coatings, which uses the cylindrical deposition geometry. (See 
electronic edition for a color version of this figure.) 
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the platen determines the deposited layer thickness: that is, for a given deposition 
rate, a slower velocity produces a thicker film and vice versa. The rotational velocity 
of the substrate platen is therefore adjusted (via computer control) for each layer 
being deposited, in accord with the pre-calibrated deposition rate and the target 
thickness of that specific layer. Depending on the total coating thickness and the 
number of substrates being coated, many hours of actual deposition time (i.e. the 
amount of time that the magnetron sources are in operation) may be required, 
as high-quality films usually necessitate relatively low deposition rates in order to 
achieve minimal interface imperfections. 

Other deposition geometries, for example one in which the substrate moves 
linearly past the magnetron sources, have been used for other applications,?! and 
may be adopted in the future for high-throughput production of multilayer-coated, 
thin-shell X-ray mirrors. One advantage is that a substrate load-lock system could 
be more easily incorporated in a linear-motion system, thereby eliminating the need 
for a long pump-down time (which typically lasts for several hours or more). 

Magnetron sputtering also provides a variety of user-adjustable deposition 
parameters that can be exploited to control the microstructure, interface qual- 
ity, and other properties of thin films and multilayers; this attribute makes mag- 
netron sputtering particularly well-suited for optimizing multilayer performance. 
As explained by Refs. 22 and 23, the microstructure of films grown by magnetron 
sputtering is highly dependent on such parameters as working-gas pressure, sub- 
strate temperature, substrate bias voltage, target-to-substrate distance, and oth- 
ers. These deposition conditions can have a profound impact on the energy and 
momentum of particles impinging on the growing film, thereby affecting the film’s 
growth and microstructure. Film microstructure can in turn affect interface quality, 
due to crystallite formation, for example. In any case, optimal deposition condi- 
tions are required to realize maximum multilayer performance; the identification 
of the optimal deposition conditions for a given material pair generally must be 
determined experimentally. The best results have typically been obtained using the 
lowest possible working-gas pressure that can sustain a stable magnetron plasma 
(typically of order 1.5 mTorr, depending on the magnetron design and the coating 
system geometry), and with a typical target-to-substrate distance of order 10 cm. 
Reactive sputtering using working-gas mixtures (e.g. Ar + N2) and other deposition 
variants can also have a strong effect on microstructure, interface quality, and other 
properties, but are often highly material-specific, so generalizations cannot be made 
easily (see, e.g. Ref. 24). 


5. Multilayer Characterization 


A variety of thin-film and surface-science characterization techniques have been 
used over the years to investigate multilayer structure and properties. While 
many of these techniques have proven useful for the elucidation of film growth 
mechanisms, interface formation, degradation mechanisms, and so forth, arguably 
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Fig. 5. Schematic diagram of X-ray reflectometry in the 6 — 20 geometry. (See electronic edition 
for a color version of this figure.) 


the most important (and prevalent) technique used in the development of X-ray 
multilayer films is X-ray reflectometry, typically using Cu Ka (~8keV) radiation 
from an electron-impact, Cu-anode X-ray tube, measured in the so-called 0 — 20 
geometry illustrated in Fig. 5. In this geometry, a pencil-beam of monochromatic 
radiation (typically produced using a crystal or a multilayer monochromator in 
conjunction with a series of narrow slits) is incident on the film surface at a graze 
angle 6, and the detector is positioned at an angle equal to 20 in order to capture 
the reflected beam. Using precisely co-aligned rotation stages, the intensity of the 
reflected beam is measured over some range of incidence angles, by scanning both 
the mirror and the detector in synchronization (i.e. maintaining the 0 — 26 relation), 
thereby producing a reflectance-vs.-graze-angle curve such as that shown in Fig. 6, 
for example. By fitting the measured reflectance curve, using the modeling formal- 
ism outlined in Sec. 2, it is possible to derive with high precision layer thicknesses, 
layer densities, interface widths, and other parameters (e.g. surface oxide thickness 
and composition). A number of commercially available X-ray diffractometer systems 
can be used for such measurements. (Large angle X-ray diffraction and non-specular 
scattering measurements can often be made with the same instrument as well, to 
investigate crystallinity, interface morphology, and other film properties.) 

Other notable techniques that have proven useful in the development of 
X-ray multilayer films include high-resolution, cross-sectional, transmission electron 
microscopy (e.g. Ref. 25), and Auger or X-ray photo-electron spectroscopy, with or 
without depth profiling (e.g. Ref. 26). 

The X-ray reflectance of a multilayer coating at the X-ray energies and graze 
angles for which the coating was designed is of course the most important property 
to be determined in the context of the development of astronomical X-ray telescopes. 
So-called “at-wavelength” measurements have been made in the hard X-ray region 
using both synchrotron radiation (e.g. Ref. 27), and using electron-impact X-ray 
sources (e.g. Ref. 28). A recently developed laboratory-based hard X-ray reflectome- 
ter? is shown in Fig. 7: it uses a W-anode X-ray tube that operates up to 160 keV 
and tungsten slits to form a low-divergence pencil beam of radiation, typically 
20-40 ym in width at the sample, allowing for measurement of reflectance at very 
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Example reflectance-vs.-graze-angle measurement and fit of a periodic X-ray multilayer. 
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Fig. 7. A hard X-ray reflectometer system for multilayer characterization? that operates up to 


160 keV. Reflectance-vs.-energy is measured by dividing the spectrum of X-rays reflected from the 
mirror (bottom) by the incident beam spectrum (top). (See electronic edition for a color version 


of this figure.) 


small graze angles. A CdTe detector system is used to measure the incident and 
reflected beam intensities as a function of energy, with the reflectance-vs.-energy 
curve for the film under study computed from their ratio. (The data shown in 
Fig. 2(b) were obtained using the instrument just described.) 
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6. Coating Stability and Film Stress 


In addition to its optical performance, the temporal and thermal stability of an 
X-ray multilayer coating, and its residual stress, are crucial properties that also 
must be determined experimentally. Residual stress in the film can affect its adhe- 
sion to the substrate, and can also distort substrate figure, particularly in the case 
of thin-shell substrates. Additionally, the mechanical and chemical properties of the 
constituent materials can affect the film’s temporal and thermal stability (as can 
film stress, if it is sufficiently high); high coating stability is generally required 
for use in space-flight instrumentation. Stability can be assessed by measuring 
X-ray reflectance or other film properties as a function of time or temperature, as 
appropriate. 

The net stress in an X-ray multilayer film is equal to sum of the stresses in each 
layer in the film stack, weighted by the corresponding layer thicknesses, plus the 
interfacial stresses present between each pair of layers.°° The various contributions 
to the net film stress are sensitive to both the coating design and the deposition 
conditions. Consequently, in some cases the deposition conditions can be adjusted 
to minimize film stress*! without substantially degrading X-ray performance. 

Film stress in X-ray multilayer coatings can be measured either in situ (i.e. 
during film growth) or ex situ, typically using X-ray diffraction methods* or by 
measuring substrate curvature.°? The widely-used wafer curvature technique relies 
on the Stoney equation to determine film stress from the measured change in the 
radius of curvature of a thin substrate after film deposition, given the known film 
and substrate thicknesses, and the known biaxial elastic modulus of the substrate 
(thin wafers made of well-characterized, single-crystal silicon are commonly used for 
this purpose). Commercial instruments are available for measuring film stress based 
on the wafer curvature technique. Such instruments typically measure the deflection 
of a laser beam as it is scanned along the substrate surface to determine substrate 
curvature. In situ techniques can be used to elucidate film growth mechanisms 
and stress evolution, but they are generally impractical for use in high-throughput 
multilayer coating systems. Ex situ techniques are sufficient for characterizing 
“as-deposited” residual film stress, and for measuring changes in film stress with 
time or temperature. 


7. Future Research Directions 


Future hard X-ray astronomy missions utilizing X-ray multilayer coatings for high- 
energy response will likely require improved multilayer performance relative to what 
has been achieved thus far. Four main avenues of potential improvement are dis- 
cussed here. First, the development of aperiodic coatings for hard X-ray telescopes 
promises higher reflectance, and thus higher telescope effective area, with thinner 
films. These coatings may therefore prove to have both better performance and 
reduced production costs (due to shorter coating times.) Second, it may be possible 
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to increase telescope effective area by improving the spatial uniformity of coating 
thickness on segmented, cylindrical thin-shell substrates that are mass produced. 
That is, coating non-uniformity over the mirror surface, due largely to coating geom- 
etry considerations, degrades the coating response relative to the target design; 
achieving perfect coating uniformity would eliminate that degradation. However, 
controlling coating uniformity on multiple cylindrical segments simultaneously, to 
an extent that goes beyond what has been achieved thus far, will require new deposi- 
tion strategies. Third, response beyond the 79 keV cutoff of the NuSTAR instrument 
(corresponding to the Pt K-edge) will allow for all-new astronomical observations. 
New multilayers comprising light elements such as Ni or Co in place of W or Pt 
have been investigated for relatively smooth response to higher energies, but so 
far these coatings offer no clear benefit.1?-'4 However, W-based coatings (including 
the recently-developed WC/SiC system**) have been shown to operate efficiently 
at energies that are much higher than the W K-edge, and so a combination of 
Pt- and W-based coatings may offer the best route to hard X-ray telescopes having 
a relatively smooth response to higher energies. Finally, future hard X-ray telescopes 
will likely require higher angular resolution. Assuming that high-resolution, thin- 
shell substrates can be eventually mass produced, multilayer film stress will cause 
figure errors that may degrade the substrate’s angular resolution, unless techniques 
for mitigating coating-stress-induced deformations can be developed. As hard X-ray 
telescope angular resolution approaches 1 arc-second, multilayer film stress may 
become an increasingly important problem. 
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Charge-coupled devices (CCDs) have been the detector of choice for all X-ray 
astronomy missions of the past 20 years. I discuss the operational principles of 
CCDs used in the X-ray energy band and review CCD instruments on a variety 
of astrophysics and heliophysics missions. 


1. Introduction 


The field of X-ray astronomy started in 1962, more than 50 years ago. Since then it 
has advanced tremendously through observations by many X-ray astronomy satel- 
lites that included many technological developments. One example of such a techno- 
logical development is the X-ray charge-coupled device (CCD). When CCDs were 
first used for X-ray detection in the laboratory in the late 1970s, several X-ray 
astronomy satellites were already successfully launched. The performance charac- 
teristics desired for X-ray detectors were excellent effective area, quantum efficiency, 
position resolution, energy resolution and time resolution. Since no X-ray detectors 
satisfied all of them, each satellite employed a combination of a few types of X-ray 
detectors. The first X-ray astronomy satellite, UHURU, employed gas proportional 
counters. The HEAO-1 satellite, in the late 1970s, carried a large area gas propor- 
tional counter array. The Finstein satellite, launched in 1978, was the first X-ray 
mission to use focusing optics with imaging detectors with an angular resolution of 
a few arcseconds. It employed a position sensitive gas proportional counter and a 
channel-plate. Both of them had poor energy resolution. It also employed a solid- 
state spectrometer (SSS) that had ten times better energy resolution than that of 
a gas proportional counter. The SSS consisted of a cryogenically cooled lithium- 
drifted Si(Li) detector at the focus of an X-ray imaging telescope. It had no spatial 
resolution and its working temperature was 100K, which was achieved by using a 
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cryogen. In the 1980s, X-ray astronomy satellites EXOSAT and Tenma employed gas 
scintillation proportional counters that had two times better energy resolution than 
gas proportional counters. Around this period, an X-ray CCD designed for photon 
counting showed better energy resolution! than that of a solid-state spectrometer. In 
1993, the AS'CA satellite employed an X-ray CCD (SIS) in photon-counting mode. 
The SIS showed good performance in spatial resolution and in energy resolution. 
Its working temperature was 210K, which was achieved by using a thermo-electric 
cooler. Following satellites in the 21st century carried X-ray CCDs as imaging- 
spectrometers. Here, we will briefly explain the performance of X-ray CCDs and 
review them on various satellites. 


2. General Explanation 


The CCD has been widely used as an imaging device since its invention by Boyle 
and Smith,? who were honored with the Nobel Prize in Physics in 2009 “for the 
invention of an imaging semiconductor circuit — the CCD”. CCDs are the basis 
for digital imagery in everything from pocket cameras to the Hubble Space Tele- 
scope. The application of the CCD in optical astronomy is described in Volume 2 
of this work, as well as in many other reference works. While the X-ray CCD has a 
spectrometric imaging capability, there are two types of applications: one functions 
as an integrating X-ray imager for a very bright source like the Sun, and the other 
functions as a photon-counting imaging spectrometer for weak celestial sources. The 
Yohkoh satellite, launched in 1991, employed an X-ray CCD for solar observations 
for the first time.? Subsequent solar X-ray observatories have also employed X-ray 
CCDs. The techniques and performance of those instruments are basically the same 
as CCDs in optical applications with the exception that they detect X-ray photons 
rather than optical photons. Therefore, we will review here the CCD application 
for X-ray photon counting devices. 


2.1. Detection of X-ray Photons 


The basic concept of photon detection in CCDs is identical to that of the solid- 
state detector (SSD). CCDs are arrays of depletion layers created either by MOS 
capacitors or pn junctions. The depletion layer is sandwiched between a series of 
electrodes (called “gates”) and a thin plate, forming a pixel array. Each pixel con- 
tains multiple (usually 2 or 3) gates. Corresponding gates on different pixels are 
electrically connected so that we can transfer charge by clocking the gate voltages. 
The depletion layer is the region where photons are absorbed, generating signal 
charges that can be detected. It is formed from the gate side and expands towards 
the other side of the wafer. Its depth is determined both by the applied voltage and 
by the resistivity of the silicon wafer. If the wafer is not fully depleted, a neutral 
“field-free” region is left behind. 
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The gate side is called the “front” side while the other side is the “back” side. 
If the CCD is designed such that the X-ray enters from the front side, it is called 
a front-illuminated (FI) CCD; otherwise it is called a back-illuminated (BI) CCD. 
Either the front side or the back side is an entrance “window” where the X-ray may 
be absorbed and lost. X-rays passing through the entrance window can be detected 
in the depletion layer. In the BI CCD, the wafer is fully depleted so that photons 
are not absorbed in the neutral region. 

The sensitivity for X-rays is determined by the product of the entrance window 
transmission and the depletion layer absorption. Therefore, there are two ways to 
improve sensitivity: one is to reduce the thickness of the entrance window (effective 
at low energies, <1 keV), and the other is to expand the thickness of the depletion 
layer by using high resistivity silicon (effective at high energies, >1 keV). In the BI 
CCD, the entrance window is intrinsically very thin. In the FI CCD, virtual phase 
gates* or thinned transfer gates are employed to improve sensitivity to low energy 
X-rays. In the case of a p-channel type wafer, a typical thickness of the depletion 
layer is up to 100 um for an FI CCD and less for a BI CCD. By contrast, a thicker 
depletion layer is available for an n-channel type wafer. 

Since the energy gap, H,, in silicon between the conduction band and the valence 
band is 1.14eV, a photon with energy higher than FE, will generate an electron— 
hole (e-h) pair. Therefore, the CCD is intrinsically sensitive to infrared (IR), visi- 
ble, ultra-violet (UV) and X-ray photons. Photons below 10 keV are mainly photo- 
absorbed in silicon: a photo-electron is generated that is preferentially injected in the 
direction perpendicular to the electric vector of the incident photon. The trajectory 
of a photo-electron in silicon, along which additional e-h pairs are generated, is 
less than 1 wm in length for electron energies below 10keV. Taking into account 
the CCD pixel size, we can consider that the e-A pairs are clustered at the point 
of interaction. The photo-electron in silicon initially generates more e-h pairs than 


the number of ion pairs generated by an electron of the same energy in the gas of 
a proportional counter, which improves the inherent energy resolution compared to 
proportional counters. Furthermore, no avalanche occurs inside the CCD, so we can 
obtain good energy resolution if we can reduce the relevant noise sources. 


2.2. Data from X-ray Photon-Counting CCDs 


The CCD usually has a huge number of pixels, each of which individually functions 
as an SSD. The charge generated in each pixel is transferred pixel by pixel and 
reaches the output node, through which we obtain the full image data, one pixel at 
a time. It usually takes a few seconds to read out the entire image; this is called 
the “frame time”. The biggest difference between the CCD and the SSD is that the 
CCD is an integration type device while the SSD is a differential type device. The 
CCD output of each pixel is the sum of the charge accumulated in that pixel during 
the frame time. 
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In the X-ray energy band, the CCD has good energy resolution, similar to 
or better than that of an SSD, as long as individual photons can be detected. 
However, it has a pile-up problem: multiple photons can enter into one pixel during 
the frame time because it is an integrating device. We have no way to confirm 
how many photons generate the charge in one pixel. If we assume that the photon 
flux is uniform in some area containing many pixels, we can estimate the pile-up 
fraction. Practically, we always restrict the number of photons in a frame time to 
avoid pile-up. The frame time for an X-ray CCD is therefore relatively short, a few 
seconds, which is in stark contrast to the long exposures used for optical CCDs in 
astronomical instruments. Therefore, in X-ray astronomy, the average number of 
photons detected in a frame time is much less than the number of pixels. In other 
words, almost all the pixel data contain no X-ray photon signal. 

Taking into account the CCD pixel size, X-ray photons below 10 keV generate 
e-h pairs at the point of interaction of the photo-absorption. One X-ray photon with 
energy F generates E'/(3.65eV) e-h pairs in a single pixel. Either electrons or holes 
become signal charge, depending on the silicon substrate: p-type (boron-doped) or 
n-type (phosphorus-doped). This signal charge is generated in the depletion layer 
and moves towards the gate because of the applied electric field. As it moves, the 
signal charge cloud spreads by diffusion according to the travel distance,° until it is 
collected in a “buried” charge transfer channel a few zm below the gate. The travel 
distance depends on the absorption depth of the X-ray and is limited to the depth 
of the depletion layer. In most cases, the charge spreading is smaller than the pixel 
size. If the point of interaction is close enough to the pixel boundary, however, the 
signal charge can split into a few adjacent pixels. 

By contrast, high energy particles passing through the silicon wafer generate 
many e-h pairs along their trajectories. Since the trajectory length of a high-energy 
charged particle is longer than the CCD pixel size, the charge track generated can 
contain many pixels. We can therefore distinguish the signal of an X-ray photon 
from that of a charged particle (background) on the basis of the shape of the charge 
deposition within the CCD. 


2.2.1. Event Grades 


When the flux is low enough to avoid photon pile-up, most of the pixel outputs 
contain no signal charge. By checking the pixel output spectrum, we can measure 
the pixel output level corresponding to no signal charge, which determines the 
zero level. In the data analysis, we employ two threshold levels to find pixels that 
do contain signal charge: an “event threshold” and a “split threshold”. The event 
threshold is equal or bigger than the split threshold. When we detect a pixel whose 
output exceeds the event threshold and is bigger than its adjacent pixels, forming 
a local maximum, it is called an “event pixel”. Then we check pixels surrounding 
the event pixel. If they exceed the split threshold, they are called “split pixels”. The 
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event output is basically the sum of the zero-subtracted data values of the event 
pixel and its adjacent split pixels. 

There are 256 possible event patterns depending on which of the surrounding 
eight pixels exceed the split threshold. These are assigned “grades” that classify the 
quality of the event and the likelihood that it represents the absorption of a single 
X-ray. We will distinguish single X-ray photon events from charged particle events 
or piled-up X-ray photon events by referring to the grade. 

A variety of grading schemes have been adopted for different X-ray instruments. 
Figure 1 shows the grade definitions introduced for data from the ASCA satellite.’ 
In the ASCA case, the X-ray events are sorted from g0 to g7. Grade g0 (single pixel) 
events are nominally the best events in terms of energy resolution. Grades g2, g3 


gO: Single a 
g2: Vertical 
g3: Left 
O Pixel Above SPT 
oe & Summed in 
Final PH 
g4: Right eo p 
© Pixel Above SPT 
g5 : Single 
Sided+ But Not Summed 
in Final PH 
[| Pixel Not above 
g7: Other SPT or Event 
clad na cme Threshold 
Fig. 1. X-ray event grades employed in ASCA data analysis.° Each row shows the 3x3 pixels 


centered on the event pixel and indicates what type of pixel pattern appears in the “grade” 
associated with that row ranging from Grade g0 in the top row (Single) to Grade g7 in the bottom 
row (Other). The row labeled “g6: L+Q” contains events shaped like the letters L and Q. 
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and g4 (two-pixel split events) are the next best. Grade g6 events are almost always 
due to X-rays, but the large amount of splitting results in somewhat worse energy 
resolution. Grades gl and g5 are probably due to the pile up of X-ray events. Grade 
g7 is typically due to a charged particle leaving a long trail of signals. 

We obtain a frame image in every frame time (usually a few seconds), resulting 
a huge amount of accumulated data. In a laboratory system, we can collect all the 
image data and analyze them later. By contrast, in X-ray satellites it is impractical 
to send all the data to the ground station. Furthermore, the number of X-ray events 
recorded in a frame time is much less than the number of CCD pixels. Therefore, we 
usually select X-ray events on-board and send them back to the ground so that we 
can reduce the telemetry. In some cases, particularly for bright sources, the grade 
selection is also done on-board. 


2.2.2. Noise 


In an X-ray photon-counting CCD, the frame time is much shorter than that 
employed in the optical. Since almost all the pixels contain no X-ray signal, we 
can use the pixel data to determine the no-signal level, which is called the “zero 
level” or “bias level”, and which must be subtracted from the value of any pixels 
included in an X-ray event to measure the total charge in the event. We define 
a “dark threshold” below which pixels are assumed to have no X-ray generated 
photo-charge. If the pixel output is below the dark threshold (practically, within 
the dark range), it is accumulated into the zero level (bias) map.* The width of the 
distribution of zero level pixels (the “zero width”) is a measure of the system noise. 
Therefore, the energy resolving power of the CCD depends on the accuracy of the 
zero level map. Since the zero level of each pixel shows temporal variations due to 
the surrounding conditions, it is very important to regularly update the zero level 
map. 

Usually, the energy resolving power of an X-ray CCD is expressed by the full 
width at half maximum (FWHM) for 5.9 keV X-ray photons. Since the generation of 
e-h pairs is a stochastic process, the spread (not FWHM, but standard deviation) in 
the number of signal charges produced by monochromatic X-rays is VFN, where N 
is an expected number of signal charges and F is the Fano factor, which is about 0.12 
for silicon. This is called “Fano-limited resolution” and is about 120eV (FWHM) at 
5.9 keV. The actual performance is the root mean square of the Fano-limited value 
and the zero width. We can only reduce the zero width by reducing the noise. 

There are several noise sources for photon-counting CCDs: dark current, spuri- 
ous charge, electronic noise, charged particles, cosmetic pixel noise, etc. The origins 
of these are given in Ref. 8. When we use an X-ray photon-counting CCD, major 
objectives are to measure both an incident position and an X-ray energy. From the 
practical point of view for the observer, X-ray events can be divided into two parts. 


“This is the equivalent of obtaining dark frames in the optical. 
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One is a noise event whose noise charge appears in some pixels. We see them when 
we open the door of the CCD camera. They are mainly charged particles, and can be 
rejected by event grade selection. Although y-rays interacting inside the depletion 
layer may generate a high energy electron within the detector, they usually interact 
with materials around the CCD and generate secondary charged particles. If these 
produce characteristic X-rays in the materials surrounding the CCD, we have to 
carefully study them to distinguish them from celestial X-rays. 

The other is a noise charge that always adds to all pixels. The fluctuation of 
the noise increases the zero width. It consists of two components, whether or not it 
depends on the working temperature. Non-thermal noise originates from electronics 
that surround the CCD. It can be reduced to a few electrons, depending on the 
working frequency. Generally speaking, it can be small as it works slowly. Thermal 
noise is intrinsic to SSDs and occurs through the thermal generation of minority 
carriers. This is called a dark noise because it is produced in complete darkness. 

The amount of the thermal noise is proportional both to the detector volume 
sensitive to X-rays and to the integration time. It can be reduced to 0 by reducing 
the integration time to 0. Practically, we employ horizontal over-clocked data to 
measure the non-thermal noise. Therefore, we can separately measure the non- 
thermal noise and the thermal noise. Figure 2 shows an example of the measurement 
of the thermal noise for the MAXI mission. For the integration time of about 6s, 
the thermal noise is unmeasurable below —60°C. When the thermal noise increases 
because of radiation damage, one of the counter-measures is to reduce the working 
temperature. 


2.2.3. X-ray CCD Spectral Response 


The response function for mono-energetic X-ray photons has a photo-peak, a shoul- 
der and a tail. Figure 3 shows an example obtained by illuminating an ACIS FI 
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Fig. 2. Example of the thermal noise measurement as a function of the working temperature. 
Integration time is 6s. 
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X-ray CCD Spectral Response Function Components 
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Fig. 3. CCD spectral response for X-rays from °°Fe that generate Mn-Ka, Mn-K@ and Mn-L 
through the inner shell electron capture process. !° 


CCD® with X-rays emitted by the radioisotope °°Fe (its decay mode is an electron 
capture), which is widely used as a standard calibration source for X-ray detectors. 
In this case, the data are the sum of responses to several different lines: Mn-Ka 
at 5.89 keV, Mn-KG at 6.49keV and Mn-L at 0.6keV. (Mn-L is usually unavailable 
due to absorption in the calibration source structure.) Silicon-related peaks are due 
to the Si substrate of the CCD. X-rays from other materials may also be present, 
depending on the calibration configuration. 

The photo-peaks come from X-rays whose signal charge is completely collected. 
The shoulder and the tail come from X-rays absorbed near the silicon oxide either 
near the channel stop or the insulator just below the gate. Reference 10 discusses 
why the response function appears as it does. 


3. Mesh Experiment 


A CCD consists of a two-dimensional (2D) array of small pixels. The pixel size 
usually determines the spatial resolution and is typically a few tens of wm. The 
electrode structure within a pixel is complicated, resulting in non-uniformity in 
detection efficiency within a pixel. Furthermore, the event grade depends on the 
point of interaction of an X-ray photon within a pixel. Therefore, it is important to 
know the detection efficiency as well as the event grade as a function of the point 
of interaction within a pixel. 

tl 


A new technique, the mesh experimen was introduced to directly measure 


the CCD response within a pixel. In this experiment, we placed a 2D mesh in front 
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of the detector. The mesh contains small holes (much smaller than the CCD pixel 
size), which are periodically spaced. Figure 4 shows a schematic configuration of 
the mesh experiment. A small hole can specify the point of interaction of the X-ray 
photon within a pixel. If the hole spacing is very close (or equal in design) to a 
multiple of the pixel size, as is always the case in practice, we will obtain a Moiré 
pattern that determines the precise alignment between the mesh and the CCD. 
Assuming that individual pixels of the CCD are identical to each other, we can 
obtain the X-ray response as a function of the point of interaction of the X-ray 
photon within a pixel. The spatial resolution of this map depends on the beam 
divergence, the X-ray wavelength, the distance between the mesh and the CCD, 
and the hole size. Figure 5 shows detection efficiency maps!? of a MOS CCD for 
O-K and Y-L lines for four different grades of events, where each map is duplicated 
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Fig. 4. Schematic view of the mesh experiment.!? 
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Fig. 5. Detection efficiency map of the CCD pixel for O-K (0.52keV) and Y-L (1.9 keV) X-rays 
for: (a) single pixel events, (b) vertically-split events, (c) horizontally-split events, and (d) all X-ray 


events. !? 
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four times, in a 2x2 array, for clarity. This CCD is identical to those employed in 
the XMM-Newton and Swift satellites, and we can clearly see the high detection 
efficiency of its open gate structure. 

The mesh experiment confirms that split pixel events are generated when the 
incident X-ray is absorbed near the pixel boundary. Furthermore, we can measure 
the split fraction of the initial charge according to the distance to the pixel boundary. 
This enables us to measure the charge spread size as a function of the incident X-ray 
energy.'? The charge spreads measured are well consistent with the diffusion process 
through the depletion layer. Finally, we can determine the absorption location of 
the incident X-ray to better than the pixel size: a split event (g2, g3 and g4 in 
Fig. 1) enters near the pixel boundary, while a single event (g0) enters in an inner 
region of the pixel and a corner event (g6) can be restricted to the pixel corner. 
This technique has been used to achieve sub-pixel resolution for the ACIS CCD!4 
and the pnCCD.' 


4. Low Energy Proton Beams and Micrometeorites 


As described in Sec. 5.2, the ACIS CCDs were damaged by low energy protons 
scattered from the X-ray mirror on Chandra. The charge transfer channel inside 
the CCD is the most sensitive region to such damage. By taking into account the 
Bragg curve of a proton in silicon, we can estimate the proton energy loss rate to 
be 6eV A~!. When the proton energy is a few hundred keV, it will leave enough 
energy to displace Si atoms in a shallow region a few wm below the entrance window, 
where the FI CCD has a buried channel. By contrast, the transfer channel of a BI 
CCD is below the depletion layer, which becomes a good shield and prevents similar 
damage from low-energy protons. 

Figure 6 shows an example result of stacking plots for proton damage. The 
stacking plot shows signal charge from 5.9 keV and 6.4 keV X-rays as a function 
of the number of transfers required to read out the X-ray event. It clearly shows 
the charge transfer inefficiency (CTI) of the CCD, which is defined as the fractional 
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Fig. 6. Stacking plot of °°Fe X-ray events as a function of transfer number at —100°C for a 
MAXI CCD during a test of proton radiation damage.!® The proton beam energy is 292 keV: 
(a) no radiation, (b) 1.04x10% protons cm~? dose, and (c) 1.11x10° protonscm~? dose. 
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charge loss per transfer. The CCD employed is an FI CCD whose buried channel is 
about 34m below the CCD surface, where the proton energy of 300 keV shows the 
maximum energy deposit in silicon. We can see that there is no charge loss before 
the proton irradiation, but the charge loss increases as the proton fluence increases. 
We should note that the energy resolving power is also decreasing with increasing 
dose (reflected in broadening of the lines). The pulse height loss can be compensated 
in the data analysis, but the energy resolving power cannot be recovered by analysis 
software. 

There are two types of damage in silicon: ionization damage and bulk damage. 
The ionization damage results from the accumulation of charge in the insulating 
oxide layer below the gates and leads to a flat-band shift that causes a shift of 
the operating voltage. This can be compensated to some extent by adjusting the 
electronics. The bulk damage results from collisions of high energy particles (mostly 
protons) with Si atoms in which the Si atom is knocked out of the crystal lattice, 
forming an interstitial Si atom and a vacancy. The knock-on atom may produce 
further vacancies. These lead to an increase of the CTI. If the proton energy is high 
enough, the bulk damage becomes small and leaves a constant density of charge 
along its trajectory (minimum ionization loss ~2 MeV/g). 

Once low energy protons damage the CCD lattice, we can recover much of 
the lost energy resolution by charge injection into the top row of the CCD. This 
periodically injected charge is swept across the CCD from the far side to the serial 
register as the CCD is read out, and fills up the vacancies created by proton damage 
for a de-trapping time. The effectiveness of the charge injection depends on the 
de-trapping time and the read out speed. Figure 7 demonstrates the recovery of 
signal level in a damaged CCD. Frequent charge injection recovers the CCD energy 
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Fig. 7. Left: Stacking plot of 5°Fe X-ray events as a function of number of transfers. Four stacking 
plots correspond to (a) no charge injection (b) charge injection every 100 rows (c) charge injection 
every 64 rows and (d) charge injection every 25 rows. The vertical axis corresponds to the energy 
scale at the readout node. Right: frame image with charge injection every 32 rows. 
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resolution as well as the CCD gain. This technique has been applied very successfully 
on the Suzaku mission.!” 

The X-ray mirror can be also a collector of micrometeorites that may leave 
physical damage on the CCD surface. On October 15, 2000, a micrometeorite struck 
the pnCCD on XMM-Newton and left 35 pixels damaged.'!*® After that, similar 
events have been reported in the MOS CCDs on XMM-Newton, Swift and Suzaku. 
Almost all of them left damage in some fraction of the CCD while the rest of the 
chip functioned properly. In the worst case, a micrometeorite hit XIS2 (one of the FI 
CCD cameras on Suzaku) on Nov 09, 2006. XIS2 showed a large amount of charge 
leakage and never came back. 

The charge transfer channel of the FI CCD is very close (a few wm) to the CCD 
surface. By contrast, the charge transfer channel of the BI CCD is protected by the 
depletion layer, which is at least several tens of wm thick. This makes the BI CCD 
less susceptible than the FI CCD to surface damage effects. Although the statistics 
accumulated so far are not enough for a definite conclusion, we think that the BI 
CCD has a great advantage over the FI CCD both in the case of low energy proton 
damage and of micrometeorite bombardment. 


5. Space-borne CCDs for X-ray Photon Counting 


So far, there are several X-ray astronomy satellites carrying X-ray CCDs for photon 
counting, summarized in Table 1 in order of their launch date. In general, the 
background level of the X-ray detectors depends on the orbit and inclination. In 
low Earth orbit (LEO), the particle background rapidly increases above 600km. 
Even at low altitude, high background rates occur during passages of the South 
Atlantic Anomaly. The satellite orbit also affects the design of the thermal system. 
All X-ray CCDs for photon counting require a low working temperature to reduce 
the thermal noise. In LEO, a simple radiation cooling system is not enough due to 
the IR heat flow from the Earth. Therefore, missions in LEO employ an extra cooling 
device (usually a Peltier cooler based on the thermoelectric effect) in addition to the 
radiator. By contrast, in high Earth orbit a radiation cooling system is enough to 
cool the CCD. In this way, the X-ray CCD is controlled at the working temperature, 
depending on the satellite, either by using a heater or a Peltier cooler. 

We will review the X-ray CCDs for each satellite, mainly from the technical 
point of view. The scientific achievements can be easily seen from their publications. 


5.1. ASCA 


The ASCA satellite was launched in 1993 February, and carried an X-ray photon- 
counting CCD into orbit for the first time.’ There were two identical CCD cameras 
(SIS); each had four X-ray CCDs assembled in a mosaic style. Figure 8 shows the 


Mission 


ASCA 
Chandra 
XMM-Newton 


Swift 
Suzaku 
MAXI 
Astrosat 
Hitomi (ASTRO-H) 
Spektr-RG 


Launch 


‘Year 


1993 
1999 
1999 


2004 
2005 
2009 
2015 
2016 
2019° 


Table 1. 


Orbit 
(km) 


550 
140000°—10000 
114000°—7000 


600 
550 
400 
650 
550 
L2 


Inclination 
(degrees) 


31 
28.5 
40 


CCD Chip Size 
Instrument (mmxmm) 
SIS 12.7x12.7 
ACIS 24.6 x 24.6 
pnCCD 60x60 
MOS CCD 24x 24 
XRT 24x24 
XIS 24.6 x 24.6 
SSC 25.4 25.4 
SXT 24x24 
SXI 30x30 
eROSITA — 28.8 28.8 


No. of 
Chips 


4 
10 


32 


1 
4 
is 


Major satellites carrying X-ray CCDs for photon counting. 


Pixel Size 
(um) 


27 
24 


4Eccentric orbit; > Unregulated due to cooler power supply failure; “planned; “http: //astrosat.iucaa.in 
g y 


Working 
Temp. (°C) 


—62 
—120 
—90 
>-133 
<— 50> 
—90 
—60 
—80 
—110 
—95 


Type 


FI 
FI/BI 
BI 
FI 
FI 
FI/BI 
FI 
FI 
BI 
BI 


Ref. 


19 
20 
21 
22 
23 
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Fig. 8. Top view of the ASCA SIS CCD plane. 4 chips cover one inch square.® 


detailed structure of the SIS focal plane.® The CCD was a CCID7?® that was man- 
ufactured in the MIT Lincoln Laboratory. It was a frame-transfer type CCD that 
could be closely abutted to other imagers on three sides of the imaging array. The 
device had 420x420 27m square pixels in the imaging region and was constructed 
using a three-phase triple-polysilicon process. Each SIS was thermally connected to 
a radiator through a heat pipe. This connection cooled the baseplate of each SIS 
to —40°C under normal operation. A Peltier cooler was located directly above the 
baseplate. The practical working temperature of the SIS was about —62°C. The SIS 
had its own mechanical door that automatically opened in orbit and latched. Since 
the CCD was sensitive to optical light, an optical blocking filter, made of a 100nm 
unsupported film of Lexan sandwiched between two layers of 40nm aluminum, was 
placed just below the door and kept in vacuum during the launch operation. 

The SIS was placed at the focal plane of a thin foil mirror telescope that had 
high sensitivity up to 10keV. The performance of the SIS opened a new era in 
X-ray astronomy. Figure 9 shows an example of an optically thin thermal spectrum 
from the SNR, W49B,”° exhibiting many emission lines from various highly ionized 
elements. In this way, the SIS revealed detailed X-ray spectra from various targets. 

The SIS performance gradually degraded due to the bulk (or displacement) 
damage on orbit. Yamashita et al.!? measured the dark current increase and the 
decrease in charge-transfer efficiency (CTE) from the actual data in orbit for 5 years. 
They found that the damage level was different from pixel to pixel. In the SIS, the 
zero level was not measured for each pixel due to the restriction of the available 
memory size. The relatively high working temperature and the available memory 
size made it difficult to recover the SIS performance from the effects of radiation 
damage. 
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Fig. 9. X-ray spectrum of the SNR, W49B,?9 showing many emission lines for the first time. 


5.2. Chandra 


The Chandra X-ray Observatory (CXO) was launched into a highly eccentric orbit 
in July 1999. The fraction of the sky occulted by Earth is small, as is the fraction 
of the time spent in the Earth’s radiation belts, where the detector backgrounds 
are high. It carries a superb X-ray mirror with spatial resolution of 0.5 arcseconds, 
which corresponds to 25 wm at the focal plane. There are two focal plane science 
instruments: the micro-channel plate (HRC) and the Advanced CCD Imaging Spec- 
trometer (ACIS).?° The ACIS, shown in Fig. 10, consists of two parts, ACIS-I and 
ACIS-S. The ACIS-I is a 2x2 array of FI CCDs for high-resolution spectrometric 
imaging over a 17-arcminute square field of view (FOV). The ACIS-S is a 6x1 array 
of four FI CCDs and two BI CCDs mounted along the grating dispersion direction. 
One BI CCD can be placed at the aim point of the telescope, also providing high- 
resolution spectrometric imaging (8 arcminutes square). Both ACIS detector arrays 
are covered with aluminized polyimide optical blocking filters. The CCDs employed 
in the ACIS were produced by MIT Lincoln Laboratory? and consist of 1024x1024 
24 um square pixels. The ACIS CCDs are cooled to —120°C in normal operation 
using a radiator to provide cooling and heaters to control the temperature precisely. 
Figure 11 shows an ACIS image of the supernova remnant Cassiopeia A,°° which 
yields the best X-ray image obtained so far. 

The image quality of the mirror is so good that the non-uniformity within a CCD 
pixel affects the data. Chandra has a capability of a built-in dither in its pointing 
position that allows observations to average over the non-uniformity of individual 
pixels, as well as within each pixel. The dither pattern for observations with ACIS 
is a Lissajous figure spanning 16 arcseconds peak-to-peak, with a temporal period 
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Fig. 10. ACIS on board Chandra. 10 CCDs are mounted on the movable stage. See electronic 
edition for a color version of this figure. 


Fig. 11. X-ray image of the Cassiopeia-A SNR.°° Image quality of 0.5 arcseconds reveals a com- 
pact source in its center. See electronic edition for a color version of this figure. Image credit: 
NASA/CXC/SAO. 
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of about 1000s. Section 4 of Ref. 31 shows a dither-on image on the CCD and a 
corresponding sky image for a point source. 

At the beginning of Chandra operations,®* the ACIS detectors received a heavy 
bombardment of low energy protons during passage through the Earth’s radiation 
belts, due to the unanticipated forward scattering of charged particles by the Chan- 
dra mirrors. This resulted in substantial CTE degradation of the FI CCDs, causing 
the gain, quantum efficiency, energy resolution, and grades to exhibit row-dependent 
effects. The damage analysis and its cause are given in Ref. 33. The BI CCDs show 
modest degradation since the protection afforded by the depletion region in front of 
the buried channel prevented the on-orbit radiation damage seen in the FI CCDs. 
Since this problem was found, the ACIS has been moved out of the focal point of 
the mirror whenever Chandra passes through the radiation belts to avoid further 
proton damage. The detailed story and the countermeasures are given in Ref. 34. 


5.3. XMM-Newton 


The XMM-Newton satellite®® was launched in December 1999. It has three identical 
X-ray telescopes with imaging cameras. Two of the telescopes include a reflecting 
grating spectrometer in the optical path. Together, these cameras compose the Euro- 
pean Photon Imaging Camera (EPIC). They utilize two different detector types: the 
novel pnCCD?! and MOS-type CCDs.?? 


5.3.1. pnCCD 


X-ray CCDs are generally metal-oxide-silicon (MOS) CCDs optimized for X-ray 
detection. For high energy X-rays, we have to expand the depletion layer by using 
high resistivity silicon. For low energy X-rays, we have to improve the entrance 
window, either by using thinned transfer gates or by thinning the detectors to 
enable back-illumination. In any case, the effective depth of the depletion layer is 
limited to about 100 um. 

By contrast, the pnCCD is based on reverse-biased pn-diodes and is always 
fully depleted. Since all components of the sensor are built up with pn-junctions 
instead of MOS structures, the device is denoted pnCCD.?! It was developed by the 
Max-Planck-Institute for Extraterrestrial Physics (MPE) for use in X-ray astron- 
omy on the XMM-Newton observatory. It is working reliably and stably in orbit; 
the maximum peak shift uncertainty at 6keV has been only 2eV over the past 
14 years. 

The pnCCD camera was designed to fit the performance of the X-ray tele- 
scope on the XMM-Newton satellite. The pixel size of 150 zm square (corresponding 
to 4.1 arcseconds on the sky) was chosen®® taking into account the X-ray mirror 
point spread function of 15 arcseconds half-energy width (HEW) and 6.6 arcseconds 
FWHM. Figure 12 shows a picture of the XMM-Newton flight detector. The pnCCD 
was fabricated on a 4inch wafer in 1997 with a format of 384384 pixels covering a 
6cm square. The sensor thickness is 280 wm with full depletion. It has a detection 
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Fig. 12. Photograph of the XMM-Newton pnCCD fabricated on a 4 inch wafer.» See electronic 
edition for a color version of this figure. 


efficiency higher than 90% at 10 keV. The low-energy response is given by the very 
shallow implant of the pt back contact; the effective dead layer is of the order of 
30nm. 

The pnCCD has 12 subunits, each with parallel readout of 64 channels, for a 
total of 768 channels for the entire camera. High radiation hardness is built in by 
avoiding active MOS structures and by the fast transfer of the charge in a depth of 
more than 10 jm below the surface. The working temperature of —90°C was chosen 
to reduce dark current to less than 0.1e7 per pixel per readout cycle of 73 ms. The 
working temperature was also selected to be consistent with the requirement to 
avoid the accumulation of contaminants (mainly of ice) on the radiation entrance 
window at a low residual pressure inside the camera. 

Figure 13 shows the image and spectrum of the Tycho SNR obtained by EPIC 
pn-CCDs on XMM-Newton.*” Many emission lines are clearly resolved. 


5.3.2. MOS CCD 


Each EPIC-MOS CCD camera employs seven EEV (now e2v) type CCD-22 
CCDs (Fig. 14) so that it can cover the focal plane of 62mm in diame- 
ter, equivalent to 28.4 arcminutes. Each CCD has 600x600 40~um square pix- 
els. It is a conventional MOS type CCD having two readout nodes. The 
full CCD image can be read out using either node, or read out using both 
nodes simultaneously to halve the readout time. It is a three-phase CCD. 


"http: //www.pnsensor.de/Welcome/Detectors/pn-CCD/index.php 
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Fig. 13. X-ray image and its entire X-ray spectrum of the Tycho SNR obtained by XMM-Newton. 


Major emission lines are labeled.?” 


Fig. 14. Photograph of the XMM-Newton MOS-CCDs with their flexible connectors mounted in 
the cryostat.2? See electronic edition for a color version of this figure. Image credit: ESA. 


One of the three electrodes has been enlarged to occupy a greater fraction of each 
pixel, and holes have been etched through this enlarged electrode to the gate oxide. 
The open fraction of the total pixel area is 40%, providing a high transmission 
for very soft X-rays, which is directly measured by the mesh experiment shown in 
Fig. 5. The actual mean depletion depth is 35-40 ym.?? 
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5.4. Swift 


The Swift Gamma Ray Burst Explorer was launched in November 2004, carrying 
the X-Ray Telescope (XRT) and two other instruments. The XRT?* contains one 
CCD that is identical to those employed in the MOS CCD in XMM-Newton. The 
cooling system contains both a radiator/heat-pipe and a Peltier cooler. Due to a 
malfunction of the Peltier power supply, the CCD temperature ranges from about 
—50°C to about —70°C, depending on the satellite orientation. 

The CCD system has various working modes: a Photo-Diode mode (no position 
with 0.14ms time resolution; disabled following a micrometeoroid impact in May 
2005), a Windowed Timing mode (1D position with 1.8 ms time resolution) and a 
Photon-Counting mode (2D imaging with 2.5s time resolution). This enables the 
XRT to cover a wide range of intensity up to 60 times the flux of the Crab nebula, 
which has never been achieved by other X-ray CCDs on satellites. The Photo-Diode 
mode is a fast timing mode designed to produce accurate timing information for 
extremely bright sources. This mode alternately clocks the parallel and serial clocks 
by one pixel each. Charge is accumulated such that each digitized pixel contains 
charge integrated from the entire FOV. 


5.5. Suzaku 


Astro-E was launched in 2000 but could not reach orbit due to a rocket failure. 
The re-flight mission, Astro-EII, was successfully put into an LEO on July 2005, 
and named Suzaku. In both satellites, an X-ray CCD camera (XIS?*) was prepared 
by a collaboration between Japan and MIT. The CCDs on Astro-E were identical 
to those employed in Chandra ACIS and in HETE2, while those on Suzaku are 
an advanced version of a similar CCD. The XIS CCD was re-designed for better 
performance than those for Chandra ACIS, with a charge injection (CI) capability 
to mitigate effects of radiation damage and a chemisorption surface treatment?® on 
the BI CCD for improved charge collection and spectral resolution. There are three 
FI CCDs and one BI CCD. The ground calibration indicates that the FI detectors 
have a depletion thickness of 65 wm and a dead layer thickness of 0.28 um Si plus 
0.44 wm SiOg, while the BI detector has a depletion thickness of 42 wm and a very 
thin dead layer, consisting of 5nm HfO2, 1nm Ag, and 3nm SiQg. 

Figure 15 shows an image of the XIS.74 It is equipped with a vacuum-tight 
door that is opened in orbit and latched. The optical blocking filter is a 140nm 
unsupported film of polyimide sandwiched between two layers of aluminum of 80nm 
and 40 nm thickness.°° Figure 16 is a close up of the front end assembly. The CCD 
is cooled to —90°C by using a three-stage Peltier cooler. 

In orbit, we found a rapid degradation of the CCD performance, increasing the 
FWHM of the FI CCD from 130eV to 200eV in a year. Therefore, a Spaced-row 


° Suzaku operated successfully from 2005-2015. 
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ASTRO-E2 XIS -S 


Fig. 15. Suzaku XIS body. Four XISs are employed. See electronic edition for a color version of 
this figure. 


Fig. 16. Front end assembly of the XIS on board Suzaku. A multi-stage Peltier cooler is below 
the CCD. See electronic edition for a color version of this figure. 


Charge Injection (SCI*°) scheme was applied to the XIS. In the SCI technique, a 
charge is injected into CCD rows periodically. The injected charge fills the radiation- 
induced traps as a “sacrificial charge’.© Then subsequent charge packets gener- 
ated by X-ray photons will not be trapped if the clocking time is shorter than 
the de-trapping time. The Suzaku XIS is the first X-ray CCD that improved its 
performance in orbit.*! This improved the FWHM of the calibration source from 
200eV to 140eV. The SCI was the normal observation mode from 2006 to 2015, 
the end of the mission.? Figure 17 shows a history of the peak apparent energy 


dhttp://heasarc.gsfc.nasa.gov/docs/astroe/prop tools/suzaku td/node10.html. 
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Fig. 17. XIS history of the peak apparent energy for 5.9keV X-rays. The width of the scatter 
indicates the energy resolution. The monotonic degradation of the CCD gain is due to the proton 
damage. The SCI technique was introduced around the day of 450 and improves the CCD perfor- 
mance, particularly for the FI devices XISO and XIS3. See electronic edition for a color version of 
this figure. 


and the width at 5.9keV. The SCI was tested in orbit around mission day 450. We 
see that implementation of the SCI generated a dramatic improvement of the CCD 
performance. 


5.6. MAXI 


The Monitor of All-sky X-ray Image (MAXJ)*? was successfully launched in July 
2009 by the space shuttle, Endeavor, and was installed on the International Space 
Station (ISS). One of the instruments on the MAXI is the Solid-state Slit Camera 
(SSC),?° which has two cameras: a forward-looking camera and an upward-looking 
camera. Each has 16 CCDs, as shown in Fig. 18. It has no X-ray mirror but has 
an entrance slit that restricts the FOV to 90° x 195. It scans the sky according to 
the ISS orbit. Each CCD, manufactured by Hamamatsu Photonics, is 25mm square 
with a 24m square pixel. An aluminum coating of 200nm thickness on the CCD 
enables it to eliminate an optical blocking filter in front of the CCD, which in turn 
makes it possible to eliminate a vacuum tight body. The CCD is fabricated on a 
high-resistivity wafer, providing a thick depletion layer of about 60 wm. 

The cooling system consists of two parts: one is a single stage Peltier device 
shown in Fig. 19, and the other is a radiator. The hot side of the Peltier is thermally 
connected to the body of the SSC, while the cold side is directly connected to the 
CCD silicon wafer and acts as a mechanical structure. The body of the SSC is cooled 
down to around —20°C, depending on the thermal condition of the radiator. The 
radiator consists of two panels, an upper panel and a forward panel, each connected 


X-ray Charge-Coupled Devices 171 


Fig. 18. MAXI-SSC camera body containing 16 CCDs.?° See electronic edition for a color version 
of this figure. 


CCD chip Peltier 


Fig. 19. CCD employed for MAXESSC. The Si wafer and its substrate are mechanically supported 
by Peltier semi-conductors.?° See electronic edition for a color version of this figure. 


to the SSC through a loop heat pipe that functions as a heat diode. The Peltier 
cools the CCD to around —60°C using about 1 W for each CCD. 

The SSC suffered a leak of IR radiation during the daytime, resulting in 
heavy edge brightening. Therefore, it normally operates during the nighttime. 
Among the X-ray satellites in LEO, ISS has a large inclination of 51°6, result- 
ing a passage at high latitude where the background level is quite high due to 
the charged particles. Figures 20 and 21 show frame images obtained near the 
high background region [W108.6, N49.4, cut-off rigidity = 1.0GeVc~+] at the same 
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Fig. 20. MAXESSC frame image obtained by forward-looking camera.?° 


Fig. 21. MAXLESSC frame image obtained by upward-looking camera.?° 


exposure time (2010 April 21 08:34 UT). We can clearly see many particle events 
in both the forward-looking camera and the upward-looking camera. The forward- 
looking camera image shows many circular extended events that are generated by 
particles entering into the CCD along the normal to the CCD surface, while the 
upward-looking camera image shows events of long trajectories that run almost 
parallel to the CCD surface. Furthermore, we can see that the forward-looking 
camera receives about 10-times more particle events than that of the upward-looking 
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camera. This can be understood if most of the charged particles are those trapped 
by the geomagnetic field.?° 


5.7. Hitomi (ASTRO-H) 


The ASTRO-H mission was launched in February 17, 2016 into LEO and was named 
Hitomi. It carried an X-ray microcalorimeter (SXS), a hard X-ray imager (HX]), 
a soft gamma-ray detector (SGD), X-ray telescopes and an X-ray CCD camera 
(SXI).2° The SXS had very good energy resolution below 10keV (about 5eV at 
5.9keV) with a relatively small FOV (3’ square). The HXI had good efficiency at 
high X-ray energies (5-80 keV) with a FOV of 8’ square. The SXI had a large FOV 
(38’ square) with wide energy band (0.5-12 keV). 

After the launch, the SXS immediately started operations and demonstrated 
its performance. The SXI operation started in the beginning of March, 2017 and 
also demonstrated its performance. At the end of March, the Hitomi satellite broke 
up into several parts and was lost due to the improper operation of the attitude 
control system. 

The CCD employed in the SXI is a p-channel CCD: the silicon wafer is an 
n-channel type and a signal charge is a hole instead of an electron. It is a fully 
depleted BI CCD and has a depletion layer of 200 jum. The pixel size is 244m 
square with a chip size of 3x6cm? including a frame store region. This type of 
device has been developed by Hamamatsu Photonics in a collaboration with the 
X-ray group in Japan and the National Astronomical Observatory of Japan.** It is 
primarily used as a BI CCD so that there is no gate structure blocking the entrance 
of X-rays. There is an anti-reflection coating for optical use or an Al coating for 
X-ray use. 

The SXI had four CCDs abutted to provide a focal plane area of 6cm square. 
It employed a mechanical cooler that reached a low temperature of —110°C. The 
mechanical cooler was kept working at constant power while the SXI was kept at 
constant temperature by using a heater control. Because the CCD is a BI type, 
the signal charge spread is bigger for lower X-ray energies, which are absorbed 
close to the back side. The SXI added 2x2 pixels on-chip so that conventional 
event recognition methods can be used. Like the Suzaku XIS instrument, the SXI 
also employed the SCI method to compensate for performance degradation due to 
radiation damage on-orbit. 


6. Summary and Future Prospect 


CCDs are now widely employed as standard imagers in both X-ray and optical 
astronomy. In the X-ray band, we speed up the readout frequency of the CCD so 
that we can run it in a photon-counting mode, which is in stark contrast to its use 
in optical astronomy. This enables us to measure the energy, the incident position, 
and the arrival time for each X-ray photon, as is done using other types of X-ray 
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detectors. As we have explained the performance characteristics of the X-ray CCD, 
its effective area and its imaging accuracy are quite suitable as a focal plane detector 
of an X-ray telescope. The energy range collected by a typical X-ray telescope is 
also a good match to that covered by an X-ray CCD. There are X-ray detectors 
that are superior to the X-ray CCD in specific characteristics. Furthermore, we can 
expect that novel types of X-ray detectors will continue to be introduced. However, 
taking into account the imaging system including electronics and thermal control, 
the X-ray CCD is quite a useful and well-balanced detector as the focal plane imager 
of an X-ray telescope. 

There are several problems raised on the X-ray CCD in space. One is a perfor- 
mance degradation of an SSD in the space environment. In particular, the CTE can 
be seriously degraded by low-energy protons trapped around the Earth. There are 
some counter-measures that can be applied, including lower working temperature, 
incorporation of narrow charge transfer channels, and use of a mechanical shutter. 
Among them, the SCI method was used on the Suzaku mission for the first time, 
and functioned properly for 10 years. 

The second problem is that a CCD is sensitive to IR, visible, and UV as well 
as X-ray photons. Since X-ray telescopes usually collect all of them, we need a 
mechanism to block IR, visible, and UV without reducing the X-ray transparency. 
A thin plastic film coated with aluminum is placed in front of the CCD or the 
aluminum is directly coated on the CCD surface, which must pass the vibration 
test and the environmental test for space use. 

The third problem is bombardment of a micro-meteorites, which has been 
reported many times so far. Since X-ray telescopes collect high-speed micro- 
meteorites, they will bombard the focal plane imager. If the charge transfer channel 
is destroyed, the CCD does not work at all. Since an FI CCD has a charge trans- 
fer channel just below the surface, it is quite fragile. In a BI CCD, the charge 
transfer channel is under the depletion layer, which provides a relatively high 
tolerance. 

We have introduced a counter-measure whenever we have a problem on an X- 
ray photon-counting CCD employed in space. Quite new problems may arise in the 
future, which, we expect, will be compensated with a new idea. Due to the mature 
technology and heritage accumulated so far, we expect that photon-counting X-ray 
CCDs will continue to be standard X-ray imagers in X-ray astronomy. 
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Invented in the mid-eighties, DEPFET detector amplifier structures have been 
foreseen as radiation detectors for particle tracking, as X-ray imagers and for 
position-resolved spectroscopy in basic and applied science. ESA’s BepiColombo 
and Athena missions, to be launched in 2018 and 2028, respectively, will be 
equipped with focal plane detectors based on the DEPFET principle. The 
DEPFET technology allows for very low noise signal amplification at high speed 
due to its minimal input capacitance. As the DEPFET is made on high resis- 
tivity double-sided structured silicon, it exhibits high quantum efficiency and 
back illumination through an ultrathin radiation entrance window. X-ray detec- 
tion in single photon counting mode with spectroscopic performance has been 
demonstrated using X-rays from 277 eV to 10 keV. An electronic shutter has been 
integrated as well as a non-destructive repetitive readout scheme to obtain noise 
figures as low as 0.25 electrons (rms). We discuss the basic principles as well as 
astrophysical applications and perspectives for the future. 


Introduction 


The need for high speed, high resolution X-ray detectors originated from many 
experiments in basic and applied science. These include X-ray astronomy, planetary 


tShortly after the final revisions of this manuscript, Gerhard Lutz passed away on April 28, 2017. 
Beside his other important contributions to science and instrumentation, he was one of the inven- 
tors of the DEPFET and developed various design variants to improve the device capabilities. For 
his lifetime work, he received several awards, including the Radiation Instrumentation Outstand- 
ing Achievement Award of the IEEE Nuclear and Plasma Society in 2011. We have lost a highly 


respected colleague, a dear friend, and mentor. 
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science, experiments at synchrotron sources, Free Electron Lasers and X-ray 
fluorescence measurements in material science. For several years dedicated active 
pixel sensors (APS) have been developed for this purpose, based on the DEPleted 
p-channel Field Effect Transistor (DEPFET). In addition, DEPFET detectors are 
foreseen for particle tracking in high energy physics or as electron imagers in trans- 
mission electron microscopes. In the visible range they can be used as single photon 
detectors within a wavelength bandwidth from 350nm to 1100nm. 

In most X-ray astronomy missions, photons are imaged through telescope mir- 
rors onto a focal surface with detectors that record position, energy and time. 
Depending on the type and quality of the X-ray optics, the half energy width of the 
image may vary from 0.5 arcsec (Chandra') to arcmin (Suzaku?). The focal length 
varies, e.g. from 1.6m (eROSITA®) up to 12m (Athena*). This corresponds to a 
required position resolution capability of the focal plane instrument ranging from 
10m up to 1mm. The field of view can be as large as 50 arcmin with a physical 
focal plane size (for a 12m focal length) of 25 x 25cm?. Depending on the collecting 
area of the X-ray telescopes, which range from a few tens of cm? up to 1.4m? in 
the case of Athena, the count rate capability of the focal plane detectors must be 
tailored accordingly. In the case of Athena, the count rate for a bright source, e.g. 
the Crab nebula with its 33ms pulsar, can be up to 75,000 X-rays per second; 
however, the energy measurement of the incident X-rays requires at most a single 
photon within a field of 3 x 3 pixels per readout frame. 

All parameters together, the collecting area, the angular resolution of the optics, 
the pixel geometry, the readout speed, associated electronic noise and spectroscopic 
resolution, the radiation hardness and the quantum efficiency of the detectors, finally 
define the scientific value of the X-ray instrument, and it is obvious that a detector 
that is customized and optimized for the specific application is essential to achieve 
the ambitious scientific goals of a mission. The broad spectrum of applications 
is possible due to the unique working principle and versatility of the DEPFET 
technology. First of all we will introduce the basic DEPFET concept in Sec. 2. 
Section 3 treats the DEPFET as an element of a two-dimensional pixel detector 
and introduces the matrix operation of DEPFET detectors. Section 4 summarizes 
the additional capabilities that have been implemented in DEPFET detectors, like 
electronic shutter, integrated analog storage, repetitive non-destructive readout, 
nonlinear signal compression and the gated operation of DEPFETs. Finally, Sec. 5 
introduces two astrophysical projects where the above outlined properties have been 
implemented: BepiColombo and Athena. 


2. The DEPFET Structure 


The DEPFET structure was invented in 1984 by Josef Kemmer and Gerhard Lutz.° 
It is based on the combination of the well-known Field Effect Transistor (FET) and 
the sideward depletion method, as applied in the Silicon Drift Detector (SDD) 
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Fig. 1. Schematic drawings of MOS type DEPFETs® with circular (left) and linear (right) geom- 
etry. Signal charges collected in the internal gate can be drained via the Clear-FET by applying 
voltage pulses to the Clear contact and/or the Cleargate. A deep p-well shields the Clear contact 
from the bulk side, so that signal charge cannot unintentionally reach it. 


invented by Emilio Gatti and Pavel Rehak® in 1983. While in conventional detector 
arrangements the signal charge is collected on an electrode that is connected to the 
gate of the input transistor, in a DEPFET detector the signal charge assembles in a 
potential minimum located in the depleted bulk below the gate of a FET.” Figure 1 
shows the functional principle of a DEPFET. A FET is located on one surface of a 
silicon wafer and a large area diode on the opposite surface. The n-type bulk is fully 
depleted. Suitable doping and choice of bias voltages creates a potential minimum 
for electrons right below the channel of the transistor, the so-called Internal Gate 
(IG). Electrons generated by radiation, as well as thermally generated electrons 
(dark current), are collected in the internal gate. They create mirror charges in 
the channel and thereby increase the channel conductivity; hence for fixed source, 
drain and external gate voltages the transistor current also increases (drain current 
readout mode). Alternatively, for fixed transistor current (source follower readout 
mode) with variable source node, the source voltage will change depending on the 
number of electrons collected in the internal gate. 

To remove the charge from the internal gate in order to reset the device, usually 
a second FET structure, the so-called Clear-FET is implemented. Applying defined 
positive voltage pulses to the Clear-FET removes all charge from the internal gate. 
Therefore there is no statistical variation in the amount of leftover charge and the 
reset noise is zero. The charge can be measured by the current increase after charge 
collection or by the current (for source-follower readout, the voltage) difference 
before and after clearing of the IG. 

As a consequence of the charge collection in the IG instead of an electrode, this 
device has several intrinsic advantages: the DEPFET is simultaneously a detector, 
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amplifier, and analog signal storage (memory cell) device that provides excellent 
energy resolution close to the theoretical Fano limit;? it allows for readout on 
demand; and it is a natural cell for a pixelated detector suitable for spectroscopic 
imaging. The bulk of the device is fully depleted and therefore the backside diode 
can be used as a homogeneous unobstructed thin entrance window. In addition, the 
full depletion of the device extends the sensitivity to higher X-ray energies, depend- 
ing on the choice of thickness of the silicon bulk material. Finally, careful backside 
entrance window engineering allows optimization of the detector performance for 
low energy X-ray sensitivity or, using an anti-reflective coating, for high quantum 
efficiency in the optical regime. 

A variety of DEPFET structures have been designed, introducing various types 
of FETs (e.g. MOSFET, JFET, n- or p-channel, closed or linear geometry) and reset 
mechanisms. Figure 1 depicts two basic structures that are frequently implemented 
in DEPFET pixel detectors. The structure with cylindrical geometry (Fig. 1, left 
drawing) has been developed for X-ray astronomy applications,!? * the linear one 
(Fig. 1, right drawing) for particle physics applications.'% 

Many variations of the basic DEPFET have been invented that are adapted 
for specific applications in astronomy and other fields of science. Most recently 
DEPFET detectors have been implemented in the Mercury Imaging X-ray Spec- 
trometer (MIXS) on the BepiColombo mission, to be launched in 2018. MIXS will 
be the first space borne instrument flying DEPFET detectors. 

A summary of properties and capabilities of the basic DEPFET and its major 
design variants for astronomical imaging spectroscopy is given in Table 1. These 
DEPFET design variants provide specific new capabilities in addition to those of 
the basic DEPFET structure. Their functional principles will be described in Sec. 4. 


3. DEPFET Pixel Detectors 


The DEPFET is a natural building block for a pixel detector. It provides the intrinsic 
advantage that charge collection and storage in the internal gate does not require a 
current in the DEPFET, hence it takes place regardless of whether the pixel current 
is switched on or off via the gate voltage. This allows one to arrange DEPFET pixels 
into large arrays and to connect them in such a way that individual DEPFETs, or 
for example a row of DEPFETs, are activated, while all others are turned off by 
applying suitable voltages to the (external) gates. Although the row-wise readout in 
“rolling shutter mode” is the currently prevailing method, the DEPFET technology 
also allows for random access readout, so-called window modes, or high speed par- 
allel readout of the full matrix via 3D integration of a readout ASIC chip with an 
individual readout channel per pixel. Figure 2 depicts an example of a 256 x 256 pixel 
detector with its readout and control ASICs! that was developed as a demonstrator 
for the detector for the IXO (International X-ray Observatory) mission.'° The inter- 
connection scheme (source-follower readout) is as follows: the drain is global, source 
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Table 1. Basic DEPFET and its design variants: properties and capabilities. 


Basic DEPFET 
Signal charge stored in potential minimum below the transistor channel 
Combined detector and amplifier properties 
Low readout noise and absence of reset noise: excellent spectroscopic performance 
Fully depleted, backside illumination: 100% fill factor, high quantum efficiency 
Non-destructive readout of signal charge 
Room temperature operation possible 
Arranged into a pixel matrix a DEPFET detector has the advantage of 


Low power consumption, as transistors are turned off during charge collection 
Charge readout at place of generation, no charge shifting needed 

Readout on demand and window-mode capability 

High speed operation possible 

Significantly improved radiation hardness compared to CCDs 


DEPFET combined with Silicon Drift Detector (Macropixel detector) 
Pixel cell size can be varied from tens of zm to centimeters 
The detector can be easily tailored to the telescope’s point spread function 
Repetitive Non-Destructive Readout (RNDR) DEPFET detectors 
Repetitive non-destructive readout of the signal allows sub-electron measurement precision 
DSSC DEPFET (DEPFET Sensor with Signal Compression) 
Intrinsic nonlinear amplification, tunable by doping and operation voltages 
Combines high dynamic range with excellent resolution for small signals 
Gateable DEPFET detectors 
Intrinsic electronic shutter: collection of signal charge only in selected time intervals 
Suppression of “misfits” (due to arrival of signals during readout phase): improved 
signal-to-noise ratio for high speed applications 
Gateable DEPFETs with intermediate charge storage 
Allows nearly dead-time-free operation and/or gateability 
Optimized for spectroscopy at high frame rates 


contacts are connected column-wise and each column is bonded to a channel of the 
readout ASIC. Gate, Clear and Cleargate contacts are connected row-wise and are 
bonded to the control ASICs (for a detailed discussion of readout and control ASICs 
for DEPFETs see, e.g. Refs. 15 and 16). The readout ASIC provides 64 channels 
and the control ASIC also has 64 channels, with two individual output ports per 
channel; therefore a total of four readout and eight control ASICs are necessary to 
operate this device. The detector is read out in the standard rolling shutter mode 
making use of correlated double sampling. 

The readout sequence for one row is divided into five steps. First, a row is 
selected by activating the respective gate channel. Then the signal, composed of the 
baseline and any additional signal contribution from charge collected in the internal 
gate, is read out. In the next step, the charge is removed from the internal gate 
by applying adequate pulses to the Clear and Cleargate channels of the selected 
row. Then the baseline is read out, and finally the row is deactivated by switching 
the gate voltage off. The difference between the signal and the baseline is directly 
proportional to the number of electrons collected in the internal gate. The gain is 
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Sensor with 
256 x 256 DEPFET pixels 
of 75 x 75 um? pixel size & 


Left/Right: four 2 x 64- 
channel SWITCHER IIb 
ASICs for row select 
(left Clear, right: Gate 
and Cleargate). 


Four 64-channel 
ASTEROID ASICs 
for column 

parallel readout 


Fig. 2. A 256 x 256 active pixel sensor with control and readout ASICs. Measurements at a 
detector temperature of —45°C and for backside illumination demonstrated an energy resolution 
of AE (5.9keV) = 127eV (FWHM; single events). © 2011 IEEE. Reprinted with permission from 
Ref. 14 (figure adapted). 
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Fig. 3. Noise performance of a 256 x 256 pixel sensor as shown in Fig. 2. Left: noise-map, right: 
noise distribution. Average noise = 3.45 electrons rms.!7 


evaluated individually for each pixel in order to be able to adjust for small pixel- 
to-pixel deviations; however, the variation in the gain is usually small. For the 
256 x 256 pixel detector shown in Fig. 2 the gain was measured to be 3.8 uV/e7 
with a standard deviation of 2% over the full 256 x 256 pixel device (no defective 
pixels).!4 Figure 3 shows an example of a noise plot for the device; the electronic 
noise is 3.45e”~ rms, with a remarkable homogeneity over the device. This is the 
basis of the excellent energy resolution, which was measured at a device temperature 
of —45°C, a frame rate of 300 frames/s and backside illumination with an °°Fe 
source (Mn Ka peak at 5.9keV). The FWHM (full width at half maximum) for 
single-pixel events, in which all charge from a photon is collected in one pixel, is 
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AE (5.9keV) = 127eV (FWHM); for all valid events (single-pixel and split events) 
the resolution is AF (5.9 keV) = 140eV (FWHM), with a peak-to-background ratio 
of 3700:1.'4 

Applying the rolling shutter mode for the readout of the detector has intrinsic 
advantages, e.g. in terms of power dissipation, as only the pixels in one row are 
activated in order to be read out, while all other pixels are deactivated. Power con- 
sumption is therefore very low and as a result, depending on the application and the 
temperature at which the detector will be operated, only moderate cooling power 
is needed. It is even possible to operate a DEPFET pixel matrix with the detector 
temperature-stabilized at room temperature. These are important factors for detec- 
tors that are to be implemented in space applications where resources are limited. 

In contrast to CCDs, charge is not transported over many pixels towards a 
readout node but is measured directly in the pixel in which the photon was absorbed. 
This makes the device largely insensitive to charge trapping and consequent transfer 
inefficiency, which are problems for CCDs. As a result a DEPFET detector provides 
a significantly higher radiation hardness compared to CCDs. 

DEPFET detectors of this most basic configuration were originally developed 
for the planned European XEUS (X-ray Evolving Universe Spectroscopy)! and 
later IXO!° missions, which had as primary scientific aims the investigation of 
the universe at an early evolution stage by observation of early black holes, the 
evolution and clustering of galaxies and the evolution of element synthesis. These 
developments finally resulted in the DEPFET detectors planned for the Athena 
mission (see Sec. 5.2). 


4. DEPFET Options 


In the following we will discuss several design variants of the basic DEPFET pixel. 
These variants were developed in response to experimental needs and add specific 
functionalities to the basic DEPFET capabilities. 


4.1. DEPFET Macropizel Detectors 


In some applications pixel sizes of a large fraction of a millimeter are desired in 
order to match the granularity of the detector to the point spread function of the 
optics. This was the case in the proposed SIMBOL-X X-ray telescope mission* and 


*SIMBOL-X was designed with a large focal length of 20m, realized by formation flight of two 
satellites (the first with the optics and the second with the focal plane instrumentation). To access 
a wide energy band ranging from 500 eV to 100 keV in order to study black holes and the centers 
of galaxies, a combination of a hard and a soft X-ray detector was to be positioned in “sandwich” 
configuration in the focal plane. For the hard X-ray detector, a CdTe sensor was chosen. For the 
soft X-ray detector, a 64x 64 pixel DEPFET sensor with pixel size of 500 x 500 um? was designed!” 
and prototype modules were successfully operated. The project was discontinued in 2009 during 
phase B study. 
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Fig. 4. Microphotograph of two neighboring 300 x 300 um? DEPFET Macropixel cells with stan- 
dard DEPFET readout in the center of a three ring drift diode. The Macropixel cells are part of 
a 64 x 64 DEPFET detector matrix. 


again for the MIXS (Mercury Imaging X-ray Spectrometer)!® instrument on the 
BepiColombo mission, which will be launched in 2018 (see Sec. 5.1). 

DEPFET Macropixel detectors’? have been developed for this purpose; they 
allow one to tailor the pixel size to the experimental requirements. The DEPFET 
Macropixel is a synthesis of the DEPFET with the well-known SDD,?? which 
adds the property of easy scalability to the advantages of the DEPFET technol- 
ogy. A Macropixel consists of a DEPFET readout node that is surrounded by 
an SDD-like drift ring structure, as shown in Fig. 4. The size and shape of the 
Macropixel can be easily adjusted in the design by the number and shape of the drift 
rings. For example for the MIXS project, a detector design with three drift rings is 
used, resulting in a pixel size of 300 x 300 ym? (see Sec. 5.1). DEPFET Macropixel 
detectors provide excellent spectroscopic performance. For single events, spectro- 
scopic resolution as good as AE (5.9keV) = 126eV (FWHM) was demonstrated 
with a SIMBOL-X prototype detector (read out at 410 ys/frame), while the MIXS 
flight detectors achieve spectroscopic resolution AF (5.9keV) < 130 eV at a frame 
time of only 165 ps/frame?! (Fig. 5).> 

For a detector consisting of a single Macropixel of 10mm? active area, a spec- 
troscopic resolution as good as AF (5.9keV) = 125eV (FWHM) with a peak- 
to-background ratio of 15000:1 was demonstrated in a time-continuous readout 
mode.?? 


bOn the MIXS instrument the detectors are operated at 170 ys/frame, however for the qualification 
of the detector modules, the detectors were also operated somewhat faster at 165 ys/frame. 
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Fig. 5. Energy-resolved X-ray spectra measured with a Macropixel detector for MIXS (see 


Sec. 5.1) at a frame time of 165 ys/frame. Left: measurement with an uncollimated °>Fe source. 
Besides the Mn K a, Mn K @ lines and their respective escape peaks, fluorescence lines from the 
setup (Ti, Al, Ag) are also visible. The energy resolution at 5.9keV is AF = 129.5eV (FWHM; 
single events) and AF = 134 eV (all valid events). Right: measurement with a small beam spot 
of 6.4keV X-rays, which was scanned over the active area during the detector module calibration 
at the BESSY-II synchrotron. The energy resolution for the 6.4keV radiation is AEF = 132eV 
(FWHM; single events) and AE = 137eV (all valid events). Due to the small beam spot, no 
fluorescence lines from the setup were activated. © 2012 IEEE. Reprinted with permission from 
Ref. 21. 


4.2. RNDR DEPFETs 


RNDR (Repetitive Non-Destructive Readout) of DEPFETs was proposed already 
in the original publication® for experiments that require very high spectroscopic 
precision. The device is composed of two adjacent DEPFET structures and the first 
design variant of an RNDR-DEPFET is shown in Fig. 6. The collected signal charge 
can be shifted between the internal gates of two neighboring DEPFET, thanks to 
one (or more) transfer gate(s). In order to measure the collected charge, it is first 
shifted into the internal gate of the first DEPFET and the internal gate of the second 
DEPFET remains empty. After the signal is measured in the first DEPFET, it is 
shifted to the second DEPFET, where it can be measured while the internal gate of 
the first DEPFET is empty and allows for baseline evaluation (see Sec. 3). Moving 
the signal charge back and forth from one device to the other allows the signal to be 
measured arbitrarily often; since each measurement is independent, averaging over 
all measurements decreases the noise. For an ideal device, the noise would decrease 
with the square root of the number of measurements, not only for thermal noise, 
but, remarkably, also for low frequency (1/f) noise. However, for a real device, the 
influence of dark current electrons limits the achievable resolution and, depending on 
the operational conditions (e.g. device temperature), this determines the optimum 
number of measurement cycles.?? 

A charge measurement precision of a quarter of an elementary charge has been 
demonstrated,?? as shown in Fig. 7. Thus it is possible, for example, to distinguish 
between 100 and 101 signal electrons as well as between one and two. 
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G, 
Fig. 6. Schematic drawing of the first design variant of an RNDR double DEPFET device using 


two FETs with individual source ($;/S2), drain (D1 /D2) and Gate (G1 /G2) contacts. The charge 
is shifted between the FETs via the transfer gates TG, and TG2.° 
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Fig. 7. High precision measurement with a RNDR-DEPFET device, averaged from more than 


300 readouts. Left: Single electron pulse height spectrum obtained by irradiation with a weak laser 
source. Right: Spectrum with single electron resolution with a mean photoelectron injection of 
12 electrons. The underlying Poisson distribution is clearly visible. © 2007 IEEE. Reprinted with 
permission from Ref. 23. 


4.3. DSSC (DEPFET Sensor with Signal Compression) 


Pizel Detectors 


Although this development has been done for terrestrial applications, it still may 
be useful for special conditions in astronomy. 
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The nonlinear DEPFET device has been inspired by applications at X-ray radi- 
ation sources, in particular at free electron lasers. There it was not the aim to mea- 
sure the energy of single photons with the highest possible precision, but the sum 
signal of many photons reaching the same pixel. For mono-energetic photons this is 
equivalent to measuring the number of photons in each cell. A specific problem was 
the required capability of measuring single photons at close distance from regions 
with high photon intensity (~10* 1 keV photons). That could be accomplished by 
providing a strongly nonlinear amplification of the DEPFET.?* 76 

Figure 8 shows the principle and a comparison of the measured amplification of 
a spectroscopic DEPFET and a DSSC: while in a standard DEPFET the internal 
gate is located below the transistor channel only, in the DSSC it extends underneath 
the source and is doped in such a way, that small signal charge is collected directly 
below the channel while large charge spills over to the area below the source where 
it has small or no current steering capability. The result is a characteristic which 
is linear for a small number of photons and strongly non-linear for high photon 
counts. By suitable doping, the characteristic can be tuned to the desired form and 
in addition it can be fine-tuned by means of the applied voltages. 
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Fig. 8. Principle of a DSSC DEPFET (right insert) compared to a standard “spectroscopy” 
DEPFET (left insert) and corresponding response characteristics. While for a standard DEPFET 
all signal electrons stay below the channel, in the DSSC this is only the case for very small charge. 
Larger charges spread also below the source and have less influence on the transistor current. 
© 2011 IEEE. Reprinted with permission from Ref. 25 (figure slightly modified). 
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4.4. Gateable DEPFET Pizel Detectors 


In all previous examples the sensors are continuously sensitive with the exception 
of the interruption of the short clear interval, which in rolling shutter mode readout 
is at a different time for every row of pixels. For some applications, however, one 
wants to collect signals only in predetermined time intervals, e.g. for the obser- 
vation of periodically changing astronomical objects such as rotating neutron or 
binary stars, as well as in adaptive optics. In order to widen the applicability of the 
DEPFET technology, devices with intrinsic electronic shutter, the so-called gateable 
DEPFETs, were developed. 

The first design of a gateable DEPFET was published in 2007?" and is shown in 
Fig. 9 (left). Although the functions of source (S) and drain (D) can be interchanged 
in the device, we describe its functioning for the configuration in which the center 
electrode is the source. It is enclosed by the gate (G) and the barrier-gate (B), which 
touches the drain electrodes. The outer region is covered by the DrainClearGate, 
which, depending on its potential, either creates an inversion layer (connected to 
the drain) or an accumulation layer (connected to Clear (C)). Thus this outer region 
may assume the function of a large area drain or a large area Clear electrode. The 
barrier-gate is used to control the flow of signal charge between the outer large 
area region and the internal gate (below the external gate). In the case of isolation 
(negative barrier-gate voltage), transfer of charge into and out of the internal gate 
is impossible and the underlying inversion layer extends the drain in a ring around 
the gate and source (independent of the DrainClearGate status). Therefore, the 
enclosed signal charge can also be read out for the case that the outer region is in 
the clear status. 

In operating the device we distinguish four different phases: First the collection 
or sensitive phase: the DEPFET current is turned off with the help of the external 


Fig. 9. Gateable DEPFET concepts: Barriergate (left) and Blindgate (right). 
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gate and charge created anywhere in the device is collected in the internal gate 
(below the external gate). This is followed by the blind or insensitive phase: collected 
charge is kept in the internal gate, additional charge is drained towards the Clear 
contacts. Then the DEPFET current is turned on and the pixel is read out. Finally 
the charge is removed from the internal gate and the DEPFET current is read out 
again (baseline sampling). 

Another variant of the gateable DEPFET, the Blindgate DEPFET, is shown in 
Fig. 9 (right). It is derived from the standard DEPFET (Fig. 1 (left)) by supplement- 
ing the source with a MOS structure (“BlindGate” ) that can put into accumulation 
or inversion state.?® One or more additional clear structures (BLIND) are embedded 
within this MOS structure. 

Compared to the Barriergate DEPFET, the Blindgate DEPFET requires an 
additional switched voltage. 

Correct functioning of both concepts has been verified by simulations and also 
experimentally.” ?° For example, the impact of the gated mode on the spectroscopic 
performance for high speed operation is shown in Fig. 10. As the signals generated 
by X-rays arrive randomly, they may enter the internal gate while the readout ASIC 
has already started the processing of the DEPFET output. Electrons arriving during 
the signal sampling phase of the readout will corrupt the output, depending on the 
time of arrival of the charge during the sampling process, leading to a so-called misfit 
background given by the ratio treaa/tint (readout time divided by image integration 
time). If the integration time is of the order of the readout time, the gated operation 
can reduce the background by an order of magnitude, which significantly increases 
the spectroscopic performance of the device. 
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Fig. 10. Comparison of the spectroscopic performance of a Blindgate DEPFET Macropixel 
(10 mm? active area) with and without the use of the gated option during readout for an integration 
time only a factor of ~3 higher than the readout (signal processing) time. The peak-to-background 
is improved by one order of magnitude using the Blindgate.?? © SISSA Medialab Srl. Reproduced 
by permission of IOP Publishing. All rights reserved. 
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4.5. Gateable DEPFETs with Parallel Charge Collection 
and Signal Readout 


In light of the requirements of the latest generation of X-ray observatory mission 
proposals, like GRAVITAS or Athena (see Sec. 5.2), which demand excellent spec- 
troscopic performance at high frame rates, a new DEPFET sensor development 
was necessary that avoids the occurrence of misfits and in addition removes the 
dead time during the insensitive state of the device. The goal was a sensor that is 
continually sensitive, even during signal readout when arrival of new signal charge 
in the internal gate of the DEPFET must be prevented. 

Two quite different design solutions were proposed to realize a sensor that 
provides these capabilities: the GELPix and the INFINIPIX, which are depicted in 
Fig. 11, For the GELPix*® (Fig. 11, left), this task is solved by providing in each pixel 
an intermediate charge storage region (“collection area” ) continuously collecting all 
signal charge while the internal gate of the DEPFET is shielded against incoming 
charge (note that no charge from the internal gate is lost to the collection area during 
this phase). The transfer gate (TG) allows a rapid (few tenths of nanoseconds) 
transfer of the charge accumulated in the collection area into the internal gate of 
the DEPFET, where it can be read out “slowly” and precisely while all new signal 
charge is again accumulated in the collection area. 

A somewhat more complicated device is the INFINIPIX (Fig. 11, right), in 
which each pixel is composed of two subpixels with a common source. While subpixel 
(1) is being read out, it is electrically shielded so that no additional charge can arrive 
at the internal gate during signal processing; at the same time, the neighboring 
subpixel (2) collects all arriving charges in the internal gate without processing 
them. Once the signal has been read out in subpixel (1), the potentials are adjusted 
such that subpixel (1) can start to collect charges and subpixel (2) is electrically 
shielded so that signal processing can start there. No additional signal charge can 
reach the internal gate of subpixel (2) during this period; instead, signal electrons 
are all guided to the non-processing internal gate of subpixel (1). 
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Fig. 11. Two distinct DEPFET designs to realize the gateable DEPFET with intermediate stor- 
age. Left: GELPix, right: INFINIPIX.?9 
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The advantage of GELPix is the simpler pixel structure, the need for only one 
readout channel per column of pixels, drain or source readout, and the reduced num- 
ber of control lines and control signals during operation. Furthermore only half the 
number of DEPFETs have to be calibrated compared with the INFINIPIX design. 
Device simulations®” show a misfit suppression of more than 10° for both concepts. 
First test devices of both designs have been fabricated and proof-of-concept mea- 
surements have begun.*! Both pixels can be combined with drift rings to allow large 
pixel sizes (Macropixel option, Sec. 4.1) and additionally, as there might be ample 
time for RNDR readout (Sec. 4.2), a pixel design with RNDR can be included for 
either design if the experiment requires drastically increased charge measurement 
precision. 


5. Astrophysical DEPFET Applications 


In the past a variety of proposals for the use of DEPFETs as an imaging, tracking 
and spectroscopic detector have been submitted: TESLA, ILC and Belle2 as ver- 
tex detectors in particle physics; XEUS, IXO and SIMBOL-X in X-ray astronomy; 
and EuroCAMP for XFEL science. In this section we will discuss two approved 
astrophysical projects, where different types of DEPFET detectors are either imple- 
mented or foreseen for implementation. 


5.1. The BepiColombo Mission 


Mercury, the innermost planet of our solar system, is difficult to explore by ground- 
based telescopes as well as by satellite missions due to its proximity to the Sun. The 
European Space Agency’s (ESA) BepiColombo mission,?? which is being carried 
out in collaboration with the Japanese Aerospace Exploration Agency (JAXA), 
will be launched in 2018 and will arrive at Mercury in 2024 for a one year science 
mission phase with an option for an additional year of mission extension. Upon 
arrival, the spacecraft will decouple into two independent satellites: the Mercury 
Planetary Orbiter (MPO), with a payload of 11 instruments, and the Mercury 
Magnetospheric Orbiter (MMO), with a payload of five instruments. One of the 
instruments on the MPO is the Mercury Imaging X-ray Spectrometer (MIXS),'® 
which is designed to study the elemental abundance of the planet’s surface through 
spatially-resolved spectroscopy of X-ray fluorescence (XRF) emission lines from key 
rock-forming elements. The two-channel spectrometer MIXS will provide excellent 
spatial and spectroscopic resolution due to innovative optics and detector concepts. 
It targets a broad energy range from 0.5 to 7keV with a spectroscopic resolution 
below 200 eV (FWHM) at the reference energy of 1 keV after six years of travel and 
one year in Mercury orbit. The two channels are composed of a telescope channel 
with an imaging microchannel plate mirror (see Chapter 5 of this volume), which 
will provide a spatial resolution of 1km on Mercury’s surface at periherm, and a 
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Fig. 12. Left image: Photograph of the MIXS flight instrument. It is composed of a telescope 
channel (left) and a collimator channel (right). Photo © University of Leicester Space Research 
Centre. Right image: Photograph of a MIXS flight detector coupled to its thermal sink, which is 
glued on top of the electronics side of the detector chip. The DEPFET detector itself is thermally 
decoupled from the ceramic board, where the power consuming ASICS are located. All cooling 
power is thereby concentrated on the detector chip. © 2012 IEEE. Reprinted with permission from 
Ref. 21. 


collimated channel with a large field of view of 10.4°.18 Figure 12 (left) shows a 
photograph of the fully integrated MIXS flight instrument. ° 

The detectors that provide X-ray imaging spectroscopy for both channels are 
identical DEPFET Macropixel detectors (see Sec. 4.1). Each detector is a matrix 
of 64 x 64 pixels with a pixel size of 300 x 300 um?, resulting in a total active area 
of 1.92 x 1.92cm?, on 450 zm thick fully depleted silicon. The readout time for one 
full frame is only 170 ps. The pixel size was chosen to match the angular resolution 
of MIXS’ telescope channel. The detector chips can, in principle, be operated at 
room temperature with a moderate spectroscopic resolution (for single events), mea- 
sured at 5.9keV, of: AF (room temperature) = 173eV (FWHM), compared to AE 
(—40°C) < 130 eV. In order to mitigate the loss of spectroscopic resolution caused by 
radiation damage in the hostile environment at Mercury, the MIXS detectors run 
at operation temperatures below —40°C. As cooling power is limited, a complex 
mounting and integration scheme?!° for the detector was necessary in order to 
thermally decouple the detector die from its surrounding readout and control elec- 
tronics and to provide all available cooling power to the detector (Fig. 12, right). 
The dependency of spectroscopic resolution on the device temperature after proton 
irradiation is demonstrated in Table 2. Calibration measurements of the flight and 
flight spare devices were done at the Physikalisch-Technische Bundesanstalt (PTB) 
at the BESSY-II synchrotron facility in Berlin. Monochromatic X-rays from 500 eV 


©MIXS was developed by a consortium of institutes and companies in the UK (University of 
Leicester, The Open University, RAL, Magna Parva), Finland (University of Helsinki, FMI, Patria, 
SSF, Oxford Instruments, Beneq), Spain (INTA, CAB, Lidax, Crisa), Germany (MPS, MPE, MPG- 
HLL, PNSensor) and France (PHOTONIS). 
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Table 2. Performance of the DEPFET Macropixel detectors for the MIXS instruments 
onboard BepiColombo. 


Detector type DEPFET active pixel detector 
Active area 1.92 x 1.92 cm? 
Pixel size 300 x 300 pum? 
Array size 64 x 64 (organized in two halves) 
Readout ASIC ASTEROID 
Control ASIC SWITCHER-S 
Detector temperature <—40°C 
Readout time per frame 170 us 
Target energy range 0.5 keV to 7 keV 
Measured energy resolution 128eV < AE < 131eV at 5.9keV4 
(FWHM) 75eV < AE < 77eV at LkeV 
Required energy resolution AE < 200eV at 1keV 
(after one year in Mercury orbit) 
Expected radiation dose® TID (calc) < 5.5krad 
TNID (calc) < 7.2 - 10° (10 MeV pt cm~?) 
Measured radiation hardness Radiation dose 7.3 - 109 (10 MeV pt cm~?) 
(without anneal)?° AE (5.9keV) < 158eV (T = —52°C) 


AE (5.9keV) < 218eV (T = —40°C) 
Calculated from this for 1.0 keV: 
AE (1.0keV) < 189eV (T = —40°C) 


to 10 keV were used to characterize the detector modules. The energy resolution 
was ~65eV (FWHM) at 500eV and 161 eV at 10 keV (see Fig. 13).34 The measured 
energy resolution is close to the theoretical Fano limit and provides a comfortable 
margin in order to meet the requirement of AF (1keV) < 200eV at mission end 
even without using the option of a thermal anneal of the detector chip during the 
mission. The key parameters for the DEPFET Macropixel detectors for MIXS are 
summarized in Table 2. 


5.2. The Athena Mission 


From ESA’s perspective the Athena mission* (Fig. 14 (left)) is the scientific suc- 
cessor of the XMM-Newton telescope,*° launched in 1999. The Wide-Field Imager 
(WFI)*’ is one of the two scientific instruments proposed for the Athena X-ray obser- 
vatory, which will be launched in 2028. It will provide imaging in the 0.1-15 keV 
band over a wide field of view, with time-resolved X-ray imaging spectroscopy. The 
instrument is designed to make optimal use of the angular resolution and grasp (col- 
lecting area-solid angle product) provided by the optical design of the Athena mirror 
system by covering a large field of view of 40’ x 40’ with DEPFET detectors that 
properly sample the mirror angular resolution of 5” on-axis (half-energy width). In 
addition, if high time resolution is required, a second high-speed DEPFET detector 


Measured over a wide range of detector temperatures (—30°C < T < —80°C). 
© According to the MIXS Radiation Shielding Analysis document, 8. Ibarmia et al., BC-MIX-INT- 


AN-001, Issue 2.0 (2012). 
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Fig. 13. Calibration measurements for the MIXS flight (F10, F11) and flight spare (F09) detectors 
at the beamlines of the PTB in Berlin. Monochromatic X-rays of a known flux were recorded in 
a single photon counting mode at a frame time of 165 ys/frame. The measured energy resolution 
is close to the theoretical Fano limit. (a) Spectra of single events for all three detector modules 
recorded at a beam energy of 3314eV. (b) Energy resolution (FWHM, single events) vs. beam 
energy. All three detectors show very similar characteristics with an energy resolution close to 
theoretical Fano limit. Reprinted with permission of Springer from Ref. 34. 


Fig. 14. Left image: Artist view of the Athena satellite (Image: ESA). Athena will operate in a 
L2 orbit. Right image: Focal plane layout of the WFI. The large module consists of four identical 
DEPFET detectors and has a format of 1024 x 1024 pixel. The small high-speed module consists 
of one DEPFET detector and has a format of 64 x 64 pixels. 


unit with a small field of view of 143” x 143” can be chosen. This synthesis makes the 
WFYI a very powerful survey instrument, significantly surpassing currently existing 
capabilities. 

The Athena observatory will carry a large X-ray mirror with 1.4m? collecting 
area at 1 keV, based on silicon pore optics technology.! and the focal length will 


See Chapter 2 of this volume. 
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be 12m. Two focal plane instruments will detect the imaged X-rays: a cryogenic 
detector with 3800 pixels and 5 arcmin field of view and an energy resolution of 
2.5eV, and the silicon based WFI with more than 1 million pixels and an energy 
resolution better than 150eV (FWHM) at 5.9keV. 

One challenge of the WFI instrument development will be to reduce the misfit 
background generated during high-speed sensor readout to an acceptable level (see 
Fig. 10). As discussed in Sec. 4.4, the INFINIPIX and GELPix structures have been 
designed for this purpose. Device simulations?? show that both new pixel designs 
provide the required capability of charge collection in an intermediate storage area 
(“collection area” for the GELPix; the second subpixel for the INFINIPIX) while 
the signal is read out without being disturbed by newly arriving charge. 

The WFI on Athena will be a very flexible state-of-the-art spectroscopic 
imager,°° which will allow the use of observational modes ideally fitted to the 
nature of the astrophysical sources. The WFI focal plane will be composed of 
five DEPFET detectors (see Fig. 14 (right)) with 130 x 130 um? pixel size and 
two different pixel types. The large module will consist of four identical DEPFET 
detectors with each 512 x 512 standard DEPFET pixels to provide a time reso- 
lution of 1.3 ms in full frame mode. In order to compensate for the insensitive 
regions between the four detectors, the observation will be done employing a dither 
pattern.2° The second module will consist of a high count-rate capable detector 
with 64 x 64 pixels of gateable DEPFETs with intermediate storage. This detector 
will provide a time resolution as fast as 80 ys in full frame mode. The focal plane 
arrays will be intrinsically radiation hard: flatband voltage shifts will be negligible 
and the effects of non-ionizing radiation damage are controlled by temperature and 
high-speed readout, which was already successfully demonstrated for the MIXS 
Macropixel detectors (see Sec. 5.1). The key parameters for the Athena WFI focal 
plane are summarized in Table 3. 


Table 3. Key parameter for the Athena WFI focal plane.*° 


Detector type DEPFET active pixel detector 
Energy range 0.1 keV to 15 keV 
Field of view 40 x 40 arcmin for the large module 
143 x 143 arcsec for the small module 
Array format large module 4 detectors with each 512 x 512 pixel 
Array format small module 64 x 64 pixel (organized in two halves) 
Pixel size 130 wm x 130 wm 
Detector quantum efficiency @ 277 eV: 20%, 
(incl. optical blocking filter) @ 1keV: 80% 
@ 10 keV: 90% 
Spectral resolution <150eV (FWHM) @ 6keV 
Time resolution (full frame) 1.3 ms (large module) 


80 ys (small module) 
Count rate capability, small module 1 Crab: >90% throughput, <1% pile-up 
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6. Conclusion and Outlook 


The DEPFET detector-amplifier technology provides versatile detector structures 
suitable for all kind of ionizing radiation. The low input capacitance enables low 
noise performance, the confinement of the signal charge in an internal gate allows 
for a non-destructive repetitive readout with sub-electron noise performance, an 
electronic shutter function and the intrinsic storage of signal charges. High dynamic 
range through analog nonlinear signal compression has recently been demonstrated. 
The first DEPFET systems are being integrated in BepiColombo’s MIXS instru- 
ment. They will provide the heritage and basis of the detector development for the 
WFI of the Athena X-ray mission for imaging, spectroscopy and timing. 

As the functional principle is very flexible, new variations of the described 
DEPFET devices are expected to appear in many different fields of applications. 
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1. 


The field of astronomy was revolutionized through the use of semiconductor devices, 
and the detectors that are based on this technology come in a variety of forms, 
including charge-coupled devices (CCDs). Through the use of silicon as a detection 
layer that interacts via the photoelectric effect, CCDs could perform observations 
at optical, UV, and X-ray wavelengths with excellent sensitivity, and data could 
be digitized rapidly during observations. The impact of CCDs on astronomical 
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X-ray hybrid CMOS detectors have multiple strengths for astronomical telescope 
applications, with some of the most notable among those strengths being: (1) high 
quantum efficiency across the 0.2-20keV bandpass, (2) rapid readout capabili- 
ties, (3) low power, (4) radiation hardness, and (5) adaptable readout schemes 
making use of their ability to read out individual active pixels. While longer 
wavelength hybrid CMOS detectors (HCDs) have been mature for decades and 
used for space-flight applications on multiple occasions, the X-ray versions of 
these detectors have been in the development stages for the past dozen years, 
with a recent rocket flight having brought one version of these detectors to TRL 
9 in April 2018. Ubiquitous industrial use of HCDs for commercial applications 
(e.g. cell phone cameras) makes it possible to achieve affordable and relatively 
fast developments for these devices, and the next generation of X-ray HCDs are 
already showing remarkable features that will enhance X-ray astronomy. The 
latest devices have already demonstrated event-driven readout with very rapid 
effective frame rates, in-pixel correlated double sampling, and analog-to-digital 
signal conversion on-chip. Future X-ray astronomy missions will benefit from their 
use of X-ray HCDs. 


Introduction 
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observations in the modern era cannot be understated, and the development of 
modern devices with similar and enhanced properties offer the potential for even 
more progress. 

Prior to delving into these new devices, it is worthwhile to set the stage with a 
brief discussion of the origins of CCDs. These devices were developed in the early 
1970s as an electronic analog for magnetic bubble memory,!:? and the first reported 
use of CCDs for making an astronomical image was by James Janesick when he used 
a 100 x 100 pixel CCD to obtain an image of the Moon using an 8-inch telescope 
at his home.® In 1976, Fred Landauer, Larry Hoyland, Brad Smith, and James 
Janesick obtained the first professional quality image with a CCD camera.* They 
used the University of Arizona 61-inch telescope atop Mt. Lemmon to image Uranus 
at a wavelength of 8900 A. CCDs were first flown in space as X-ray detectors on 
sounding rocket observations of SN1987A.° Since that time, CCDs have become the 
standard detector choice for X-ray imaging, with launches on several X-ray missions, 
starting with ASCA in 1993, and continuing on to the more recent grand era with 
X-ray missions such as Chandra, XMM, Swift, and others. Each of these missions 
makes excellent use of X-ray CCD imagers. 

All of these CCDs use the traditional “bucket-brigade” method of readout. The 
CCDs consist of arrays of metal insulator semiconductor (MIS) capacitors. Charge 
is stored in a pixel of the array and then during readout, voltages are manipulated 
via clocking signals that are sequenced in order to move the charge from one pixel 
to another. The charge packets from each pixel are ultimately transferred to the 
ends of the rows and columns of pixels until they reach a charge-sensitive amplifier 
on the edge of the device. Passing through the amplifier, the charge is amplified and 
digitized before processing. This process of transferring the charge through many 
pixels of the detector leads to the propagation of effects from any defects that can 
be caused by radiation damage, which means that the CCD performance degrades 
significantly throughout the lifetime of a space mission as this radiation damage 
accrues. This charge transfer process is also a major consumer of both time and 
power. 

While CCDs have had a major impact on X-ray astronomy, the continued 
advancement of the field requires continued improvement of the detectors. This is 
particularly true since future X-ray missions will have more stringent requirements 
on parameters such as frame rate, along with larger format and radiation hardness 
(e.g. see Refs. 6 and 7), and other mission concepts will benefit significantly from 
low power (e.g. see Refs. 8 and 9). 


2. CMOS Detector Technology 


Complementary metal-oxide-semiconductor (CMOS) technology is used for making 
integrated circuits (ICs) and has been the lowest noise and lowest power method for 
producing ICs, including CPUs, since the seventies. Using CMOS readout circuits 
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with a silicon absorber array provides a detector in which each pixel has its own 
readout circuitry, avoiding the problems associated with transferring charge from 
pixel to pixel (i.e. using the “bucket-brigade” manner) as is done for the CCD. 
The Read-Out Integrated Circuitry (ROIC) for each of these pixels can comprise 
structures implanted in a silicon substrate, which can be the same silicon as that 
used for the photon detection or it can be a different layer of silicon that is bonded 
to the detection layer material. Each pixel has its own circuitry, with multiple 
transistors for initial amplification and readout, which is why such a device is some- 
times referred to as an “Active Pixel Sensor”, or APS. This active pixel sensor 
characteristic is realized for both monolithic devices, in which the silicon used for 
photon-to-charge conversion is the same silicon as that used for the embedded cir- 
cuitry, or hybrid devices in which the photon-to-charge conversion silicon layer is 
optimized independently of the ROIC electronics layer and then these layers are 
bonded together (i.e. hybridized). Continued improvements in CMOS technology, 
particularly the continuing reduction in feature size, allow a significant and increas- 
ing amount of readout functionality to be fit into the silicon real-estate provided 
by the pixel. A standard feature size used in current devices is 0.18 micron, which 
enables a significant amount of circuitry to be placed within a pixel as small as 
8 x 8um. Even more complex circuitry (e.g. multiple amplifiers and comparators) 
are possible in larger pixels (e.g. see the 40 x 40 ym pixel device described in Ref. 10). 

While it can vary from device to device, a basic CMOS architecture for an image 
sensor includes three transistors (3T), as shown in Fig. 1. The transistors are metal— 
oxide-semiconductor field-effect transistors (MOSFETs). The reset MOSFET is a 
switch that keeps the charge contained in the pixel until ready for readout and then 
clears the charge after readout has been performed. The source-follower MOSFET 
reads the charge from the pixel and converts it to a voltage signal. The source- 
follower MOSFETs are used in many CMOS detector designs, but other amplifier 
designs are also in use, such as the capacitive trans-impedance amplifier (CTIA), 
which offers major advantages including a reduction of interpixel crosstalk and 
reduction of several noise sources such as that associated with variations of the 
voltage at the reset node. In each of these cases, the row-select MOSFET opens and 
closes to allow the amplified charge from different rows to be read out to the device 
multiplexing system. 

However, specialized CMOS architecture can add significant functionality. 
Devices with as many as seven transistors (7T) have been achieved.!!:!? Examples 
of some useful capabilities that can be added into the pixel architecture include 
in-pixel correlated double sampling (CDS), digitization of the analog voltage pro- 
duced by the readout amplifier to reduce noise and overall power, and more creative 
designs like in-pixel thresholding (or sparse readout) allowing selective and rapid 
readout of only pixels with charge above a threshold. The further development of 
this pixel architecture should benefit from continued developments of technology 
and capabilities in the commercial sector. 
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Fig. 1. Examples of two CMOS readout architectures, one with a source-follower amplifier (left) 
and the other with a capacitive transimpedance amplifier (right); from Ref. 11. 


2.1. Types of CMOS X-ray Detectors 


X-ray sensitive CMOS imaging detectors can be categorized as either monolithic 
or hybrid. There are currently highly developed detectors of each type that could 
be featured in upcoming orbiting X-ray observatories, but each also still has a 
significant amount of development to be done. 

Monolithic CMOS sensors have CMOS readout architecture integrated in the 
same layer of silicon that is used for photon absorption. The silicon for both the 
ROIC and the depleted material used for photon detection is high-resistivity epitax- 
ial silicon. The Smithsonian Astrophysical Observatory and SRI/Sarnoff are leading 
the development of these detectors.!? They have produced devices 9 wm thick in a 
1k x 1k format. 16 x 16 wm pixel pitches have been achieved with a 6 T readout that 
provides in-pixel CDS as well as “blooming control” gates. While these devices have 
achieved low read noise, comparable to CCDs, the depletion depth has been limited 
to approximately 10-20 micron, which severely limits photon detection response at 
some wavelengths above 1—2keV. This low depletion depth must be overcome in 
order for these detectors to achieve good quantum efficiency across the entire soft 
X-ray bandpass (up to ~10 keV). 

For the remainder of this work, we will concentrate on the description of hybrid 
CMOS detectors (HCDs) used for X-ray detection. 


3. Hybrid CMOS X-ray Detectors 


An X-ray sensitive hybrid CMOS detector (HCD) has a bulk Silicon absorbing layer 
that provides a means for X-ray photons to interact via the photoelectric effect in 
order to create measureable electron-hole charge clouds that can be collected at 
readout nodes. This absorbing layer is then indium bump-bonded to a read-out inte- 
grated circuit (ROIC) that provides a read-out electronics chain for each individual 
pixel in the detector (Fig. 2). One advantage of this system is that the absorbing 


X-ray Hybrid CMOS Detectors in Astronomy 203 


n+ layer 


Si bulk 


p+ implants 
A a a A a 
CMOS ROIC sala 


Fig. 2. Schematic cross-section of an X-ray optimized Hybrid CMOS detector array. X-ray 
photons are absorbed at varying depths in the top absorber layer. The photocharge clouds are 
accelerated by the electric field and collected on p+ implants, which are indium bump bonded to 
a readout integrated circuit (ROIC). The ROIC provides a separate readout electronics chain for 
each pixel in the detector. Having the two layers allows for optimization of each layer separately 
(figure from Ref. 14). 


layer and ROIC can be optimized separately. The absorber is often optimized for 
quantum efficiency, with devices having silicon detection layers with fully-depleted 
depths ranging from 75 um to 300 um on high resistivity silicon, while the ROIC can 
be optimized for signal processing circuitry with lower resistivity silicon. Current 
pixel pitches range from 10 to 40 wm, with formats as large as 4096 x 4096 pixels. 
HCDs optimized for X-ray detection have a thin aluminum filter deposited on the 
surface of the absorber to block optical light. These detectors are being pioneered 
by the Pennsylvania State University (PSU) and Teledyne Imaging Sensors (TIS). 


4. Some Motivations for X-ray Hybrid CMOS Detectors 


Hybrid CMOS detectors transfer the charge directly from each individual detection- 
layer pixel to the individual pixel of the ROIC. They avoid the charge transfer from 
pixel to pixel required by CCDs, which leads to multiple advantages with regards to 
power, radiation hardness, frame rates, and read-out versatility. The following is a 
summary of the properties for which hybrid CMOS detectors can provide significant 
advantages for orbiting X-ray observatories: 


(1) Pile-up: Pile-up occurs when multiple X-ray photons interact in the same pixel 
between readouts, resulting in the interpretation of multiple photons as a single 
event. For bright astronomical X-ray sources, this leads to a degradation in 
flux estimates due to the incorrect interpretation of the number of photons 
incident on the detector, and it leads to greater uncertainties on the measure- 
ment of spectra due to the mischaracterized energy of the pile-up events that 
have charge summed from multiple photon interactions. Reducing the readout 
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time reduces the pile-up. The ability to read out any detector pixel directly can 
dramatically decrease the CMOS detector readout time, down to potentially 
100 ps read times for windowed modes or even faster for a detector capable of 
sparse readout. This allows for the observation of very high flux sources with 
excellent timing information and minimal pileup effects. This can provide a 2-3 
order of magnitude improvement in peak count rate performance over existing 
X-ray CCD cameras. 

Radiation Damage: The direct read out of every pixel in a CMOS detector 
has the added effect of making these detectors extremely radiation hard. CCDs 
are vulnerable to proton displacement damage because of the need to transfer 
charge through the width or length of the detector array silicon (~few cm to 
cross a typical sensor) before being read out; a problem related to the “bucket- 
brigade” read-out scheme. While any given pixel of a CMOS sensor is subject to 
silicon bulk displacement damage,!° CMOS detectors are orders of magnitude 
less sensitive to radiation damage since each pixel sees only the radiation damage 
associated with that single pixel alone. 

Micrometeoroids: Micrometeoroids are thought to be the cause of seri- 
ous detector damage on multiple observatories currently in orbit, including 
XMM Newton, Suzaku, and Swift. A micrometeoroid impact can either directly 
damage the CCD gate structures or indirectly affect the read out of columns 
through those damaged pixels. CMOS detectors are expected to be more robust 
against micrometeoroid damage than CCDs. They should be protected from 
both failure mechanisms since they do not have exposed gates and since indi- 
vidual damaged pixels will not bloom across the detector. 

Low Power: CMOS technology is inherently low power (relative to CCDs) due 
to lower capacitance gate structures in the CMOS readout. The on-board inte- 
gration of camera drive electronics and detector signal processing reduces the 
power consumption and mass of CMOS detectors compared to CCD cameras. !® 
Quantum Efficiency: Since hybrid CMOS detectors have a detection layer that is 
high resistivity silicon and can utilize bias voltages ranging from several volts to 
~100V, fully depleted depths ranging from 75 wm to 300 um are easily achiev- 
able. This leads to high quantum efficiency across the entire soft X-ray band- 
pass, matching the best back-illuminated CCDs while maintaining the other 
advantages over those devices. 


Some Specific X-ray Hybrid CMOS Detectors 
and Their Characteristics 


HyViSI (Hybrid Visible Silicon Imagers) HCDs were originally developed by 
Teledyne Imaging Sensors (then Rockwell Scientific) for optical light detection 
applications.'*:!” Starting in 2005, the Pennsylvania State University (PSU) and 
Teledyne Imaging Sensors (TIS) began working together to modify the design of a 
HyVisSI detector, specifically the well-established 1024 x 1024 pixel H1RG, in order 
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to optimize these sensors for X-ray detection.® Following these initial modifications, 
multiple progressive iterations have occurred over the past decade. 


5.1. H1RGs and H2RGs Adapted for X-ray Detection 


Since the Hy ViSI H1RG hybrid CMOS detector was originally developed for optical 
light detection, it included an optical antireflective coating. By replacing this anti- 
reflective coating with an aluminum optical blocking filter, the quantum efficiency 
for soft X-ray photon detection became comparable to a typical back-illuminated 
CCD. The H1RG detectors have been demonstrated with multiple optical block- 
ing filter thicknesses of aluminum (e.g. 180A, 500A, and 1000A) that has been 
deposited directly on the surface of the sensor. This optical blocking filter, with the 
various thickness options, allows these detectors to be used in realistic astrophysi- 
cal environments without being overwhelmed by background optical light (e.g. from 
stars) while enabling the bulk of the X-rays to pass through the thin aluminum. 
Optical and X-ray transmission curves for those initial HIRG X-ray detectors are 
shown in Fig. 3. 

These initial H1RG X-ray HCDs, which are described by Falcone et al.!® and 
further characterized in Refs. 20 and 21, were built with an 18 um pixel size. At 
typical operating voltages, with 18 V bias applied to the substrate, they were fully 
depleted throughout the 100m depth of the silicon detection layer. They had 
typical RMS read noise of between 7-11 e~. The largest problem for these first- 
generation devices was the presence of interpixel capacitance crosstalk (IPC), which 
led to measurable signal in the pixels surrounding the pixel that actually contained 
electron-hole charge cloud from the incident photon interaction.!? While this IPC 
effect is undesirable for optical astronomy, it is even worse for X-ray astronomy 
since X-ray detectors typically count and measure the energy of each individual 
photon by detecting the charge generation in each pixel that exceeds a threshold 
for event detection, including the portion of the charge cloud that truly expands 
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Fig. 3. Optical and X-ray filter transmission curves for 3 HIRG X-ray HCDs (left figure courtesy 
of Y. Bai and M. Farris, private communication; right figure from Ref. 18). 
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into surrounding pixels. Due to IPC, the signal from any given X-ray event can 
appear to be spread throughout larger regions. The energy resolution is therefore 
significantly degraded from the process of reading each of these pixels and summing 
the signal and the noise. The IPC in these initial devices led to ~5-10% of the 
signal being measured in each of the surrounding pixels. 

A specialized H2RG provided a significant improvement of the IPC character- 
istics. This H2RG hybrid CMOS detector comprised a 36-m 1024 x 1024 detection 
layer bump-bonded to 1/4 of the 2048 x 2048 ROIC pixels of an H2RG ROIC with 
18 um pitch (i.e. every detection-layer pixel was bonded to a single ROIC pixel, 
and the effective pixel pitch was therefore 36 um). This additional spacing of the 
pixel architecture provided greater charge separation, and thus reduced capacitive 
coupling to approximately 1.8+1.0% from the central pixel to its surrounding neigh- 
bors. While this still led to some degradation of energy resolution, it enabled enough 
improvement to resolve the Oxygen Ka X-ray line and to achieve an energy resolu- 
tion of 2.7% AE/E (FWHM) from Mn Ka X-rays at 5.9keV (see Fig. 4), with the 
most up-to-date flat-fielding techniques.?? This H2RG HCD device has a read noise 
measured to be ~6.5 e~ and dark current measured to be 0.020+0.001 e~ /s/pixel at 
150K. In April 2018, this same device was successfully flown on the WRX rocket, 
thus bringing this X-ray hybrid CMOS detector to Technology Readiness Level 
(TRL) 9. 


5.2. Event-Driven Readout X-ray HCD 


The ability to embed advanced electronics into the ROIC pixels of the HCDs has led 
to some novel features on some X-ray detectors that are currently under develop- 
ment. For example, one novel HCD contains comparator circuitry that enables the 
signal in each pixel to be checked against an adjustable threshold prior to readout 
of the pixel. This allows pixels with measurable signal to be flagged for readout, 
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Fig. 4. (Left) Oxygen, Magnesium, and Aluminum Ka X-ray spectra on a specially fabricated 
H2RG X-ray hybrid CMOS detector, and (Right) Manganese Ka/K X-ray spectral data using 
the same H2RG. Resolution is AE /E ~2.7% at 5.9keV. From Ref. 22. 
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while skipping the readout of the pixels with no measurable signal. Dramatically 
improved frame rates can be achieved through this procedure since the bulk of the 
pixels in a typical X-ray observation will contain no signal. 

These devices, which are referred to as Speedster-EXD hybrid CMOS detectors, 
have been fabricated in a prototype 64 x 64 test format and characterized by Ref. 23. 
In order to accommodate the increased footprint of the electronics, these devices 
are fabricated with a 40 um pixel pitch. In addition to including the advanced event 
recognition algorithm, these devices have the following features: (1) correlated dou- 
ble sampling, (2) capacitive transimpedance amplifiers (CTIA), and (3) on-chip 
thresholding. The CTIA within each pixel, as opposed to the source-follower ampli- 
fication of the H1RGs and H2RGs, holds the voltage constant on each pixel, which 
eliminates the IPC problem described above. Reference 23 found that the IPC for 
these devices was consistent with zero, with a measurement of 0.25 + 0.2%. The 
initial prototype versions of these detectors have moderately high read noise, with 
an ultimate energy resolution measured to be 206eV (FWHM) at 5.9keV and 172eV 
(FWHM) at 1.5keV. Future developments of these devices hold promise for reducing 
this read noise to the levels achieved in the devices described below, while increasing 
the array size of the detector, and will include on-chip analog-to-digital conversion, 
resulting in a fully digital output that simplifies downstream electronics. 


5.3. The Small-Pixzel X-ray HCD 


While mature optical and infrared hybrid CMOS detectors have been built with 
pixel pitch as small as 10 zm and formats as large as 4096 x 4096, X-ray versions of 
small pixel device are still in the initial development stages. Recently, Pennsylvania 
State University researchers and Teledyne Imaging Sensors have collaborated on 
the development of an X-ray HCD with a 12.5 um pixel pitch that is intended for 
use on high angular resolution X-ray observatory instruments such as the Lynx 
High Definition X-ray Imager.’ Similar to the Speedster device described above, 
this detector has a CTIA amplifier that reduces interpixel capacitive coupling to 
insignificant levels and it has correlated double sampling built into the chip. The 
capacitance of the read-out integrated circuit for this chip was designed to be low 
enough to enable low-noise readout, and shielding was incorporated into the pixel to 
prevent crosstalk. The device can be scaled up to large formats with on-chip analog 
to digital conversion. 

Detailed properties of this small-pixel device, as well as measurements of its 
characteristics, can be found in Ref. 24. Most notable among these properties is the 
fact that these small pixel hybrid CMOS devices have the lowest measured read 
noise and best energy resolution of any X-ray hybrid CMOS detectors measured 
to-date. The read noise was measured to be 5.54+0.05 e~ (RMS), and the devices 
were measured to have an energy resolution of 148eV (FWHM) at 5.9keV and 
78eV (FWHM) at 0.53keV. 
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6. Discussion and Conclusions 


X-ray hybrid CMOS detectors are a relatively young technology that were first 
demonstrated as viable for astronomical applications approximately 10 years ago. 
Those initial engineering-grade devices initially displayed that they could achieve 
high quantum efficiency that was as good as the best back-illuminated CCDs, rapid 
readout through multiple parallel output lines, low power operation, optical blocking 
filters directly deposited on the device, and radiation hardness that was inherent 
to the active pixel sensor design that reads charge directly through a single pixel 
without charge transfer across the device. One of these early generation X-ray HCDs 
was recently flight proven on the WRX rocket flight in April 2018. 

However, those initial devices have had hurdles to overcome; most notably they 
were characterized to have read noise on the order of ~10 e~, as well as interpixel 
capacitive coupling between pixels. During the past 10 years, the interpixel coupling 
has been eliminated through the use of CTIA amplifiers in the read out integrated 
circuit of each active pixel, and the read noise has been reduced to ~5.5 e~ for the 
most recent device designs. The latest X-ray HCDs have displayed 148 eV and 78 eV 
(FWHM) energy resolution at 5.9keV and 0.53 keV, respectively. Some of the latest 
devices have also shown that event-driven readout is possible, which is an avenue to 
multiple orders of magnitude increases in effective frame rates. These newer devices, 
which are being scaled up to full size arrays with on-chip digitization, offer major 
improvements for the cameras that will be utilized on the next generation of space- 
based X-ray telescopes. 
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Adiabatic demagnetization refrigerators (ADRs) which provide a temperature 
of about 100mK for detectors are described, with emphasis on space missions. 
ADRs use the Carnot cycle with a magnetic field and a paramagnetic salt. A high 
Carnot efficiency is generally achieved. In order to reduce total mass, a multi-stage 
system is considered for space missions where the maximum magnet current is 
limited to a few Amperes and the thermal sink temperature is typically 2-4 k. 
An ADR consists of three key components: a refrigerant pill, a heat switch, and 
a superconducting magnet. Magnetic shielding of the superconducting magnets 
is also important. We summarize the materials and designs of these components. 
We show two examples of ADRs in space missions: the 3-stage ADR for the 
X-ray microcalorimeter instrument, SXS, onboard the Hitomi (ASTRO-H) satel- 
lite, and the ADR designed for the far-infrared instrument, SAFARI, on the 
SPICA mission. We then describe recent ADR developments aimed at high relia- 
bility and/or continuous operations. We finally mention an alternative method 
to obtain ~100mK in orbit, a single-shot dilution refrigerator used for the 
Planck mission, and describe the development status of a closed-cycle dilution 
refrigerator. 


1. Introduction 


Adiabatic demagnetization provides a reliable tool for attaining temperatures below 
100 mK. The method works by using the Carnot cycle with a magnetic field and a 
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Fig. 1. Left: Schematic drawing of ADR (single stage). Right: Typical cooling cycle of an ADR. 


paramagnetic salt. Such an adiabatic demagnetization refrigerator (ADR) consists 
of a refrigerant pill, a superconducting magnet, and a heat switch between the pill 
and a heat sink, as shown in Fig. 1. 

The principle of the ADR is simple. Figure 1 shows the typical cooling cycle 
represented by the relation of the entropy and temperature for a paramagnetic 
salt. In this case, the magnetic entropy is dominant and states of localized spins 
are determined by the balance of thermal energy and magnetic interactions. These 
magnetic interactions between spins become higher than the thermal energy at lower 
temperatures and the entropy of the salt is reduced along the curve (B = 0 Tesla). 
When a high external magnetic field is applied, the entropy becomes lower at higher 
temperatures, as seen in the curve (B = 2 Tesla). The cooling cycle is as follows: 


e Magnetization using a superconducting magnet (1 — 2). 

e Continuous magnetization to Byax at temperature Tynax with the heat switch 
turned on (2 > 3). Tinax is slightly higher than the heat sink temperature, Tyink 
so that the heat generated in the salt pill can be dumped to the heat sink. 

e Heat switch is turned off, and demagnetization (3 — 4) to Br. 

e Operation temperature Tope is attained, and Top. is maintained by decreasing the 
magnetic field from Br to 0 Tesla during the observation (4 — 1). 


Most ADRs use the above-mentioned Carnot cycle because it has the best thermal 
efficiency when compared with other cycles at low temperatures (<5 K). 

Before the invention of dilution refrigerators in the 1970s, ADRs were the pri- 
mary refrigeration technology used for ground experiments to obtain temperatures 
of 10mK to 100mK.!:? Nowadays, dilution refrigerators are commonly used for this 
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temperature range because a large cooling power (>10 uW) is easily attained. How- 
ever, dilution refrigerators utilize gravity to separate *He from *He, and therefore 
cannot be applied to space missions (see, however, Sec. 6). Thus, developments in 
ADRs in the last two decades have been targeted primarily toward space missions. 
Another advantage of ADRs is that a tight temperature control (of <10 kK) can 
be easily provided through the direct control of the solid refrigerant temperature. 
Such high temperature stability is required for very sensitive scientific instruments. 
Although currently ADRs are being used for ground experiments, in this paper we 
will mainly describe ADRs that are being developed for space missions. 


2. Multistage ADRs 


One restriction faced in the development of space ADRs is the maximum excitation 
current of a superconducting magnet. On the ground this can easily be higher 
than 10 A. For space missions, however, the current is typically restricted to a few 
Amperes. This is mainly because of the limited cooling power for stages above a few 
10s of degrees K, where superconducting electrical leads are not available and the 
magnet current produces significant heat dissipation. A maximum current of 2A 
has been considered for the ADR onboard the Suzaku ? and Hitomi satellites. For 
this case, the maximum magnetic fields obtained with a practical-size magnet will 
be limited to a few Teslas and the cooling power is determined by the size of the 
salt pill. For example, consider an ADR required to provide about 0.4J of cooling 
capacity* at 50mK with a 3 Tesla B-field, for a heat sink of 1.8K. The required 
mass of the refrigerant Chrome Potassium Alum (CPA, see Sec. 3.1) is more than 
2300 g. The mass and volume of a salt pill containing 2300g of CPA are too large 
for cryostats used in space missions. 

Recently, multistage ADRs have been proposed because of their relatively higher 
cooling power along with their compact size, lower weight and higher-temperature 
heat sink.° As shown in Fig. 2, two ADR units are assembled in series and the 
demagnetization temperature of the low-temperature stage can be lower than the 
heat sink offered by the high-temperature stage. In addition, the parasitic heat 
load to the low-temperature stage can be reduced by the higher-temperature stage. 
Different types of cooling cycles are introduced to optimize the mass, recycling time 
and heat load to the heat sink. 

The most important feature of the multistage ADR is its compactness, which 
makes its use significantly more advantageous for space instruments. If we consider 
the above example with a two-stage ADR, the mass of the CPA salt pill required 
for the low-temperature stage, for the same parasitic heat load, is reduced from 
2300 g to 900g with a 2 Tesla field if the demagnetization temperature of the low- 
temperature stage is 0.55K instead of being higher than 1.8K (see Table 1). This 


“Equivalent to 44W for 24 hours. 
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Fig. 2. Schematic drawing of a two-stage ADR. 


Table 1. Example designs of single- and double-stage ADRs. The 
same cooling power and heat sink temperature are assumed. The 
low-temperature-stage mass can reduced further by thermal shield of 
the high temperature stage (see text). 


Single-stage ADR Two-stage ADR 


Cooling capacity at 50 mK 0.4 J 0.4 J 
(4W for 24h) (4 uW for 24h) 
Heat sink 1.8K 1.8K 
Salt pills CPA 2300 g CPA 900 g (low) 
GLF 720 g (high) 
B max 3 Tesla 2 Tesla(low) 
3 Tesla(high) 
Demag. temp. 1.8K 1.8K/0.55K 


is accomplished by introducing an intermediate, high-temperature stage with 3 4.W 
cooling capacity at 0.5K. The parasitic heat load to the low-temperature stage can 
be significantly reduced using the thermal shield provided by the high-temperature 
stage. In this way, the CPA mass can be reduced to even 220g with a 2 Tesla field 
for the low-temperature stage, assuming that the required cooling power can be 
reduced to 1 uw W. In addition, the mass of the salt pill depends strongly on not only 
the cooling cycle sequence but also the external magnetic field. The total ADR mass 
is dominated by the superconducting magnet and the magnetic shield, and these 
sizes are determined by the size of the refrigerant pill. Hence, the reduction in the 
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Fig. 3. Two kinds of typical cooling cycles with a two-stage ADR. The solid line shows the 
sequence of the cooling cycle for the high-temperature stage, and the dotted line shows it for the 
low-temperature stage. Left: two-stage cycles step by step. Right: magnetization of both stages 
and then the demagnetization of the higher-temperature stage. 


sizes of the pills results in the effective reduction of the total mass and size of the 
multistage ADR. 

Another important advantage of a multistage ADR is that various kinds of 
cooling cycles can be selected both in the design and while on orbit.© Figure 3 
shows two kinds of typical cooling cycles for a two-stage ADR with a 2.0K heat 
sink. The initial temperatures of both stages are close to 2.0K in these cycles. The 
low-temperature stage is a CPA pill (350g), while the high-temperature stage is a 
Gadolinium Lithium Fluoride (GdLiF, = GLF, see Sec. 3.1) pill with a mass of 
380g. The maximum magnetic field is 2 Tesla for both stages. When each stage 
is magnetized and demagnetized independently (1 — 2 — 3,1’ — 2’ — 3’ in 
the left-hand panel of Fig. 3) and the demagnetization temperature of the low- 
temperature stage is 1.0K, cooling capacities of the low-/high-temperature stages 
are 0.103 J and 0.36 J, respectively (equivalent to 1 4W/3 wW for more than 28 hours 
at 45mK/0.9K). For this case, the cooling capacity of the high-temperature stage 
can become 4 J (equivalent to 3 W for more than 370 hours at 0.9K) by magnetizing 
both stages at the same time at 2K (1 — 2 and 1’ — 2’ in Fig. 3, right) and 
demagnetizing only the high-temperature stage (2 — 3 and 2’ — 3’). Additionally, 


cooling capacities of the low-/high-temperature stages can be increased to 0.132 J 
and 0.425 J (equivalent to 14W/3uW for more than 36 hours at 45mK/0.7K) 
respectively, by changing the demagnetization temperature of the low-temperature 
stage to 0.8K. As seen above, the cooling capacity of each stage can be optimized 
or modified using a different operation sequence. An active heat switch is desired 
for these flexible operations (see Sec. 3.2). 

For a cryogen-free cooling system, we need to consider a mechanical cooler 
as a pre-cooler, for which a higher interface temperature of the ADR is preferred 
(e.g. ~4K). In such cases a three or more stage ADR may be a good choice. One 
disadvantage of a multistage ADR is that it has more single-point failures. Thus it 
is important to have a high-reliability system and to consider redundancy as much 
as possible. 
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3. Critical Components for an ADR 


As shown in Sec. 1 (Figs. 1 and 2), an ADR consists of three critical components: 
refrigerant pills (often called salt pills), superconducting magnets, and heat switches. 
In this section we will briefly review these critical components and other key com- 
ponents for ADRs. 

When the detailed element design for a space ADR. is considered, the thermal 
performance should be balanced with reliability and hardware quality so that the 
design can be space-qualified. In addition, the required lifetime is critical, as this 
determines the acceptable leak rate for the refrigerant pill and the gas-gap heat 
switch, along with the required number of thermal cycles. 


3.1. Refrigerant Pills 


Depending on the initial (bath) temperature and the final target temperature, an 
appropriate refrigerant material must be selected. The best material should have a 
large entropy difference between the 0 field and finite applied magnetic fields at both 
the bath and target temperatures. The highest efficiency will be obtained with a 
material that shows a phase transition from paramagnetisim to anti-ferromagnetism 
at a temperature slightly lower than the target temperature. In Table 2, we listed 
materials often used for refrigerant pills. 

Ferric ammonium alum (FAA) is a well-known material used as an ADR salt 
pill. FAA shows a large entropy difference per unit mass down to ~40mK. Because 
FAA is a low-thermal-conductance poly-crystal, it is usually packaged in the form 
of a pill with metal rods/wires installed with the salt to act as thermal conductors. 
Because FAA corrodes copper, a manual assembly of a bundle of thin gold wires is 
commonly used for FAA pills. 

Chrome potassium alum (CPA) has properties similar to FAA. However, the 
phase transition temperature is slightly lower and the entropy difference per unit 
mass is smaller than that of FAA. Because CPA does not attack copper, oxygen- 
free copper (OFC) wires can be used as a thermal conductor. Recently, thermal 


Table 2. Materials often used for a refrigerant pill. 


Material Lowest Temp. Characteristics Space Use 
CPA ~20 mK Easier to handle than FAA. Slow to Hitomi SXS 
grow 
FAA ~40 mK Large cooling power. Deliquescent at XQC rocket, Suzaku XRS 
>35°C. Corrodes Cu 
MAS ~0.4K Large cooling power for 1.3 to 0.4K. 
Stable 
GGG ~0.8K Large cooling power for 4K to 1K. 
High thermal conductivity 
GLF ~0.6K Larger cooling power than GGG, but Hitomi SXS 


smaller thermal conductivity 
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conductors machined using wire electric discharge machining (EDM) from a cylin- 
drical metal blank have been developed. These conductors provide the advantage of 
obtaining a high heat transfer rate with a small ratio in the total cross-section of the 
pill.” Because conductors do not touch the other end of a pill container, this will also 
reduce the eddy current heating. Such a design is needed for a rapid thermal cycle 
of a continuous ADR. However, a common metal such as copper must be selected 
because of cost, and thus this technique cannot be applied to FAA. FAA and CPA 
are often used for ADRs in the 50mK to 100mK range, starting from a 0.4 to 1.3K 
thermal bath. However, CPA appears to be a better material than FAA because it 
allows a wider choice of the thermal conductor material. 

If the target temperature is ~0.5 K, manganese ammonium sulfate (MAS) may 
be a good choice. OFC rods/wires can be used as thermal conductors for MAS. 

The difficulty in using these hydrated salts is that a hermetic container must 
be designed, using thermally conductive materials, to keep the equilibrium vapor 
pressure of the water of hydration high and to prevent the degradation of the prop- 
erties of crystals. Furthermore, as the decomposition temperatures of these hydrated 
salts are close to room temperature, a special technique is required for the salt pill 
construction. 

If we start from a bath temperature of ~4K and the target temperature is 
~0.5 K, gadolinium gallium garnet (GGG) and some other ceramic/crystalline mate- 
rials, such as dysprosium gallium garnet (DGG) and gadolinium lithium fluoride 
(GLF), may be used.*:? Because GGG has a good thermal conductivity we do not 
need any other thermal conductor inside it. However, in order to utilize the cooling 
power, GGG must be connected to thermal conductors both from the thermal bath 
and to the cold stages. This is often done using He gas as a heat exchanger. Typ- 
ically, GGG is installed in a hermetic enclosure, which is filled with the exchange 
gas (Helium-3) to obtain a high thermal conduction. The pressure of the exchange 
gas required at room temperature is determined by the operating temperature, the 
thermal conductance of the gas, and the gap between the crystal and the enclosure. 
The 2nd/3rd stage GLF pills in Hitomi/SXS’ were charged to 0.5-1 atmosphere 
pressure. 


3.2. Heat Switches 


Cryogenic heat switches for space missions must have a high on/off thermal conduc- 
tance ratio as well as the capacity to be turned on or off rapidly, in order to provide 
a high duty cycle (observation time/operating time ratio). For the sake of increased 
reliability, it is also important to have no moving parts. From these points of view, 
a gas-gap heat switch is used and thermal conductance ratios as high as 3000- 
10,000 are obtained for sorption coolers and ADRs.!°'!% A mechanical heat switch 
is considered a poor choice for space use because of the disadvantage of having a 
mechanical part. On the other hand, a superconducting heat switch is another choice 
for use at low temperatures, since it provides a higher thermal conductance in the 
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on-state than that of a typical gas-gap heat switch at temperatures below 100mK.'4 
A magneto-resistive heat switch also displays good performance when used at a cer- 
tain magnetic field level at 1-4 K. Currently, thermal shrinkage switches are being 
developed .!° These switches are generally operated at higher temperature ranges 
(e.g. ~20K) than ADR operating temperatures. 

Gas-gap heat switches are operated either actively or passively. In active oper- 
ation, a getter pump is heat-sinked to a temperature stage at which the working 
gas is condensed onto the getter. To turn on the switch the getter pump is warmed 
using a heater. Because of the limitation in the cooling power of a pre-cooler, the 
generated heat for operating a heat switch must be low. In the Hitomi SXS, the 
average heating power of a charcoal getter was tailored to be lower than 0.3 mW 
with a rapid turn on/off time by optimizing the heat capacity of the getter box, the 
position of the heat pass between the gas-connecting line and the heat sink with 
reasonable thermal conductance, and a balance between the amount of gas needed 
for the required thermal conductance of the heat switch and the getter temperature 
needed for adsorption of the filling gas. 

In multistage ADR systems, we may need to control a large number of heat 
switches. In such cases, passive heat switches may be adopted. By selecting the 
proper combinations of conductive gases and getter materials, the temperature at 
which the getter starts desorption and absorption can be controlled (Table 3). By 
thermally connecting the getter to the ADR cold stage, the heat switch can be 
passively operated: the switch will be turned off when the ADR is cooled to a 
certain temperature. The disadvantage of using passive heat switches is the lack of 
the flexibility of operation in the case of failure. For example, because of a failure 
of a stage, we may want to change the starting temperature of a certain stage. 
However, in passive heat switches, the operating temperature is predetermined by 
the heat switch and cannot be changed on orbit.!% 

A tungsten magneto-resistive heat switch is another choice that also provides 
an on/off thermal conductance ratio higher than 10,000.1° Moreover, the time to 
turn on/off can be shortened by generating or rejecting a magnetic field rapidly. 
However, a high magnetic field of about 3 Tesla must be applied in order to keep the 
heat switch continuously off during observation. This condition may create another 
complexity related to magnetic shielding. 


Table 3. Heat switches. 


Type Material On/Off Temp. Control Space Use 
Gas-gap 3He /Charcoal ~10K Active Suzaku XRS, 
Hitomi SXS 
3He/Sintered SS ~5K Active/Passive 
3He/He ~0.2K Passive 
Superconducting Zn,Al,In,Sn,Pb <0.5K Active 


Magneto-resistive Tungsten 0.3-4K Active 
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The most critical issue for a heat switch operated in space is redundancy. Elec- 
trical heater lines as well as thermometers usually have redundancy, while there is 
a single point failure with a heat switch itself (for example, the leakage of a filling 
gas). Recently, quad redundant heat switches have been introduced for the Mid- 
Infrared Instrument (MIRI) on the James Webb Space Telescope (JWST), in which 
the presence of one failed switch (which cannot be turned on or off) can be accepted 
due to the redundancy provided by using four individual heat switches in a series- 
parallel configuration.'°»!” Redundancy of heat switches will be an important issue 
in future missions, and the trade-off among the total reliability, the complexity of 
a failure mode, and the impact to other elements must be studied. 


3.3. Superconducting Magnets and Magnetic Shielding 


NbTi multifilament superconducting wires are commonly used for superconducting 
magnets because of their low cost and easy availability. However, it is important to 
obtain a higher operation temperature for a multistage ADR with pre-cooling by 
mechanical coolers because they often provide >4K. Recently, a superconducting 
magnet from a thin NbsSn wire of 0.15mm diameter was developed to increase the 
critical current.’ The magnet had an inner/outer diameter of 66 mm/72.5mm with 
5850 turns, and a maximum current of 5.25 A was obtained at 10K. The second 
important advantage of the magnet is its compactness, because more than 50% 
of the total mass of an ADR consists of the magnet and magnetic shield. A 0.1— 
0.15 mm diameter superconducting wire is usually used, which determines the mass 
and cost of the magnet. 

Immersion chilling using liquid He, as in Suzaku,° is a better choice for cooling 
a superconducting magnet once the temperature distribution due to the quenching 
effect is considered. On the other hand, conduction cooling must be adopted for a 
cryogen-free cooling system. The three superconducting magnets in Hitomi!* are 
cooled conductively because of the need for continuous operation even after all 
superconducting liquid has evaporated to space. 

Residual magnetic moments are problematic for spacecraft operations and must 
be controlled to below a certain level that is defined for each spacecraft. Moreover, 
the sensors (e.g. TES micro-calorimeters) and their cold front-end electronics (e.g. 
SQUID array amplifiers) cooled by ADRs are usually highly vulnerable to very 
weak magnetic fields of ~ milli-Gauss level. Therefore, ADR magnets must be mag- 
netically shielded. The magnet may have canceling coils, which reduce the dipole 
moment as seen from outside the magnet. Yet this cancellation is often not sufficient 
and the magnet needs to be surrounded by magnetic shielding materials. In Table 4, 
we have listed the properties of high permeability materials used at ~4 K. Saturation 
field strength and permeability are the most important parameters. In general, these 
two parameters are not concomitant. As can be observed in Table 4, materials that 
have a high saturation field show a relatively low permeability. Thus a multistage 
shielding system may be adopted. In such a system, a high-saturation-field material, 
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Table 4. High permeability materials for magnetic shielding.* 


Permeability 
Material Saturation Gauss Umax [40 
Amumetal (80% Nickel) 8000 400,000 60,000 
Amunickel (48% Nickel) 15,000 150,000 12,000 
Cryoperm 10 9000 250,000 65,000 


Ultra-low carbon steel 22,000 4000 1000 


4@Taken from the data sheet of Amuneal Manufacturing Corporation. 


such as ultra-low carbon steel (ULCS), is used near the magnet, followed by a high 
permeability material, such as Cryoperm. Finally a superconducting material such 
as Nb is used close to the sensors and electronics. 

The permeability is a function not only of the temperature but also of the 
strength of the applied field and the frequency. The permeability shows a maximum 
at a certain field strength. In Table 4, we show the maximum permeability at 4K 
for a static field (Umax) and the permeability for a field strength of 40mA/cm = 
0.05 Gauss (14). 


3.4. Other Components 


While high purity copper or aluminum (99.9999% or higher) is usually selected for 
use in thermal straps and conductors, quite often a thermal contact between differ- 
ent materials creates a bottleneck that dominates the total thermal conductance. 

Electrical isolation is sometimes needed between a refrigerant pill and a detector 
assembly when considering grounding issues. Sapphire is usually placed between 
interfaces because of the high thermal contact conductance that can be provided 
with a sufficient contact force on a polished flat surface.?° It is interesting that 
sapphire is also employed for thermal insulation by having it touch a small number 
of points across a joint with alumina powder between them.?! 

Heat dissipation from magnet leads must be suppressed as much as possible, 
particularly for space cryostats. Below ~4 K, Ni-Ti superconducting wires are widely 
used. In the middle temperature range of 4K to a few times 10K, high-T¢ supercon- 
ducting leads are used. If the highest temperature is below ~30K, MgBo(Tc = 39K) 
wires are good candidates.??:?3 For temperatures above ~30K, lead wires which uti- 
lize YBCO (Tc = 92K) or BSCCO (Tc = 110K) are being developed in industries 
for high power conductors. In order to use a cryostat for space missions, customiza- 
tion is required to make the thermal conductivity small. For the Hitomi SXS, a 
YBCO conductor tape was cut to a 1mm width to obtain low conductance.?*?° The 
thermal conductance is dominated by the HASTELLOY substrate. The adoption 
of a relatively thinner substrate also reduces the thermal conductance. 
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4. ADRs for Past, Present and Future Missions 


In this section, ADRs for past, present and future spacecraft missions are described. 

An X-ray micro-calorimeter rocket experiment used a single-stage ADR with 
liquid He to provide an operating temperature of 60mK.?° The experiment was 
a technology demonstration of X-ray observations using a micro-calorimeter array 
with prototype onboard electronics. It was also a demonstration of bringing tem- 
peratures below 100mK using a single-stage ADR in microgravity. The cryostat 
was constructed and used in order to reduce the size and weight while the cryo- 
stat was designed so as to be isolated from launch vibrations. A 4-liter liquid He 
tank was supported from the vacuum jacket by two reentrant CFRP cylinders, and 
a single straight fill/pumping line with two vapor-cooled heat sinks connected to 
radiation shields was assembled. A multi-stage thin-film IR blocking filter with a 
one steradian field of view was also demonstrated. 

The X-ray observatory Suzaku was the first spacecraft in which a single-stage 
ADR. was mounted for the X-ray micro-calorimeter XRS2, and an operating tem- 
perature of 60mK was successfully obtained on orbit.2 A 920-g FAA salt pill was 
installed at the center of the 33-liter liquid He tank (<1.2K in orbit) with thermal 
isolation, and a 2-Tesla NbTi superconducting magnet was immersed in the He tank 
surrounding the salt pill. The active gas-gap heat switch used a zeolite getter that 
adsorbed the He gas at temperatures below ~10 K, opening the thermal switch, and 
released the gas above 13K to allow high thermal conduction through the switch. 
The magnet was specially designed to produce 2 Tesla with a current of just below 
2 A, and had extensive passive shields as well as bucking coils to make the dipole 
moment zero. 

Hitomi was a Japanese X-ray astronomy satellite launched in early 201€? in 
which a three-stage ADR provided an operating temperature of 50 mK for the Soft 
X-ray Spectrometer (SXS).1%:?7 3° Figure 4 shows the cross-sectional view of the 
SXS dewar, and the SXS cooling chain is shown in Fig. 5. The dewar outer shell 
is at room temperature, and a 4K-class Joule-Thomson cooler (4K-JT) provides 
4.5K. Two double Stirling coolers (2STs) are needed as pre-coolers of the 4K-JT, 
while two other 2STs are used as shield coolers. A He tank (>302) is mounted at 
the center of the dewar, and 50 mK is provided by the two-stage (1st and 2nd stage) 
ADR from 1.1-1.3K as long as the superfluid liquid He remains, while the three- 
stage ADR can also reach 50mK from 4.5K in cryogen-free operation. Figure 6 
shows a schematic of the three-stage ADR assembly. 

SPICA (SPace Infrared telescope for Cosmology and Astrophysics) is proposed 
to be the next Japanese infrared observatory, in collaboration with the European 


bThe Hitomi mission ended in March 2016 when the satellite broke up in orbit due to a malfunction 
of the attitude control system. 
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Cross-sectional view of the SXS dewar design in the Hitomi mission. The three-stage ADR 
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is mounted at the center of the He tank.!9 
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Fig. 6. Schematic of the 3-stage ADR for SXS in Hitomi.?” See electronic edition for a color 
version of this figure. 


Space Agency (ESA). SPICA is to be transferred into a halo orbit around the 
second Lagrangian point (L2) in the Sun—Earth system, which enables the use of 
effective radiant cooling in combination with the mechanical cooling system to cool a 
2.5 m-class IR, telescope to below 8K. The SPICA cooling chain is shown in 
Fig. 73! 83 The telescope assembly and scientific instruments are located on the 
top of the payload module. The V-Groove concept is used for the radiative cooling 
to deep space. The mechanical cooler system consists of two 4K-JT (4.5K), two 
1K-class Joule-Thomson cooler (1K-JT) and six 2ST pre-coolers. 

In SPICA, 50mK is provided by a combination of a *He sorption cooler 
(300mK) and a single-stage ADR, for the SAFARI (SPICA far infrared imaging 
spectrometer) superconducting IR. bolometer.**:3° A schematic of the SPICA sub-K 
cooler is shown in Fig. 8. A lightweight sorption cooler with a high cooling capacity 
at 300mK for SPICA is based on the development of the ESA Herschel mission.*© 
CPA is used as a refrigerant for the ADR, which can reach 50mK with a 300mK 
heat sink using a modest magnetic field (~1 Tesla). On the other hand, because a 
sorption cooler has a lower thermal efficiency in comparison with magnetic cooling, 
a higher heat load must be rejected into the heat sink during the recycle operation. 
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Fig. 7. Cooling chain of the SPICA payload module. The dotted line shows the thermal interface 
between the mechanical cooler system and the Scientific Telescope Assembly (STA), while the 


dashed line shows the thermal interface between the mechanical cooler system and the sub-K 
cooler.3+ 
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Fig. 8. Schematic of the sub-K cooler for the SPICA SAFARI instrument.*® See electronic edition 
for a color version of this figure. 


5. Developments of Advanced ADRs 


One of the critical disadvantages of traditional ADRs is that the cooler recycling 
time becomes a dead time for observing instruments. From this point of view, a 
traditional single-shot ADR is not a “true refrigerator”, but rather a “cooler”. 
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Fig. 9. Left: Prototype of the four-stage continuous ADR. Right: Temperature profiles of the 
four-stage CADR during operations at 60 mK.%" See electronic edition for a color version of this 
figure. 


A continuous ADR provides a solution for this significant disadvantage.!+°” The 
prototype of the four-stage continuous ADR is shown in the left panel of Fig. 9. In 
this refrigerator, four units of the ADR are connected in series. The most important 
difference between the continuous ADR and a “normal” multistage ADR is in the 
operation sequence. The temperature behavior of each stage during operation is 
shown in Fig. 9 on the right. Stage 1 (lowest-temperature stage) is continuously 
maintained at 60mK. When Stage 1 has a large magnetic field to absorb heat, 
it can maintain 60mK while Stage 2 magnetizes at higher temperature during its 
recycling phase (0.3K). Subsequently, Stage 2 is operated to be 50mK (lower than 
60 mK) while Stage 1 is magnetized during recycling. In this phase, Stage 2 must 
absorb Stage 1 magnetization heat as well as the heat load at 60mK. Stages 3 
and 4 are also operated to reject heat from the lower-temperature stages as a heat 
sink. The cycle period is a little less than 24 minutes, drastically shorter than 
typical ADRs. This short cycle period brings a higher cooling power as well as a 
constant temperature at Stage 1. Although there are no intermediate continuous 
temperature stages between Stage 1 and 4.2K, it is possible to achieve this by 
conditioning Stage 3 and Stage 4 cycle operations with a higher heat capacity for 
Stage 4, or by installing an additional Stage 4. The measured cooling powers are 
6uW at 50mK and 214W at 100mK respectively with high thermal efficiency. 
The temperature stability was also measured, and about 5 wKyms was achieved at 
50mK, while disturbances at the change of heat transfer were less than 100 UK. 
In this prototype, passive gas-gap heat switches are used among all stages except 
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Fig. 10. Schematic of the fast thermal response miniature ADR.°8 


Stage 1 and Stage 2 to reduce the complexity of the ADR’s thermal and mechanical 
design, to simplify the control electronics, and to significantly increase its efficiency. 
An active superconducting heat switch is used between Stage 1 and Stage 2 because 
only electronic conduction can provide a high thermal conductance below 100 mK. 

As mentioned above, a fast cycle operation provides a higher cooling power in 
comparison with typical ADRs. However, a very high thermal coupling is required 
between the heat sink and the pill, including the pill itself and a heat switch. The 
tandem ADR. prototype shown in Fig. 10 was developed to bring about continuous 
cooling with a quick cycle operation.?° In this prototype measurement, in which 
two ADR units with a CPA pill were assembled in parallel, the recycling time was 
2.5 minutes when cooling from 4K. A measured cooling power of 5 4W was achieved 
at 200mK with a small total mass (5.3kg including magnets and shielding). The 
second tandem prototype has two double-stage ADRs in parallel, and the expected 
cooling power is 5 wW at 100mK by cooling from 2K, while the hold time for each 
double-stage ADR is 10 minutes. Since a series ADR needs twice the number of 
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heat switches, its design may be complicated. On the other hand, high redundancy 
is obtained if it is treated as a contingency mode in the case of a tandem unit failure. 


6. Alternative Refrigeration 


The primary reason for the cooling power limitation of an ADR is the fact that 
almost all refrigerants used are a solid material, not a liquid or a gas. Thus the 
cooling capacity of a dilution refrigerator easily exceeds that of an ADR on the 
ground by stationing the storage tank on the outside of the cryostat and circulating 
the working gas from the tank into the cold part. However, in the past a dilution 
refrigerator was not a solution to cool instruments in space since a ground-based 
dilution refrigerator usually utilizes gravity in a mixing chamber to separate a ?He- 
poor phase (dilute phase, 6.6% *He and 93.4% *He) from a *He-rich phase (concen- 
trated phase) in order to pick up the 2He-poor phase efficiently and circulate *He. 
Furthermore, an external storage tank must be included in the mass and volume 
budget of the spacecraft, and the presence of a highly pressurized large volume tank 
has a risk impact on spacecraft design. 

The open-cycle dilution refrigerator mounted in the Planck satellite, which is 
designed to observe the cosmic background radiation, overcame this barrier.°9 4! +8 
It used the capillary force of the surface tension instead of gravity for the mixture. 
Figure 11 shows the cooling chain of the Planck satellite from the service module to 
the bolometer plate. To achieve a temperature of 100mK, isotopes were pre-cooled 
with external JT cryocoolers down to 4.5K. Further cooling to less than 1.6K was 
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Fig. 11. Cooling chain of the Planck satellite from the service module to the bolometer plate.?9 
See electronic edition for a color version of this figure. 
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achieved through an internal Joule~-Thomson expansion process on the return line 
of the mixture. The isotopes were stored in four identical 51 @ vessels (three for *He 
and one for *He). 10,590 @ helium per vessel at normal temperature and pressure 
was made available with the 1.8 MPa regulation pressure to supply the cooler with 
appropriate flows (22.5 jzmol/s for *He and 7.5 zmol/s for 7He). The dilution cooling 
power was 0.1 wW at 100mK, and the available cooling power of the JT stage was 
about 800 .W with a pre-cooling temperature of 5.0K with 20 umols/s *He, while 
the theoretical available power is 1.8 mW. 

The open-cycle dilution refrigerator on Planck successfully achieved 0.1K in 
the absence of gravity without a still or a complex pumping system (in other words, 
it had neither moving parts nor complex charcoal pumps). On the other hand, 
the lifetime was limited to two years because of the rejection of the 7He and “He 
mixture into space. In addition, the cooling power was also limited by the reduced 
flow rate. The lifetime of the refrigerator is therefore seen to be the result of a 
trade-off between the cooling power and the amount of ?He and +He stored before 
launch. 

Closed-cycle dilution is the next solution to close the cycle of these working gases 
via a helium isotope separator at a low temperature and reinjecting the almost pure 
3He and ‘He into the refrigerator after separation.*° 4+ *° The principal parts are a 
still, a 7He pump, a He pump, a thermal reservoir at 1.7 K, a heat exchanger, and 
a mixing chamber (Fig. 12). The liquid—vapor interface in zero gravity is localized 
in the still via capillary confinement inside a sponge as has been demonstrated 
previously in the context of a dilution refrigerator and sorption coolers.!? The “He 
fountain pump is connected to the still and acts as a semi-permeable membrane 
that allows only superfluid +He to pass through. The 7He pump rejects mostly He 
from the still, because temperatures are maintained such that the vapor pressure 
of °He is much higher than of that of He. The first prototype to demonstrate the 
principle of the closed-cycle dilution has been tested and a cooling power of 1 wW at 
temperatures below 60mK has been obtained. Lastly, the low pressure compressor 
system based on the heritage of 1K-class Joule-Thomson coolers*® for space was 
developed and used for the *He circulation, and the coupled test successfully reached 
70mK, which was the lowest temperature in the design of the cold part.4” 48 

NIS junction coolers are also potential new refrigerators, designed to cool from 
300 mK to 100 mK and provide cooling powers of hundreds of picowatts or more.*9 °° 
These solid-state refrigerators are based on normal-metal/insulator /superconductor 
(NIS) tunnel junctions that cool by removing the hottest electrons from the normal- 
metal by quantum mechanical tunneling. The NIS is connected at the base of each 
leg of a bolometer chip (single or array) and provides a cooling power of 10-100 pW 
with a small area (10-20yum square). The most critical aspect of the design is 
the method to insulate the cooling area with a rigid structure for a mechanical 
environment. 

Table 5 shows the heat dissipation into the heat sink for a cooling power of 
50mK using each type of cooler. Though it is not easy to compare because of 
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Table 5. 


3-stage ADR*® 
(Hitomi SXS) 
EMP; 27 


Cooling power 0.27uW (50 mK) 


Cooling 0.087 J/90 hours 
capacity /hold 
time 

Heat load to 3.8mW x 0.85 
heat sink hours (1.3 K) 

Heat rejection/ 11.5J (1.3K) 
cycle 

Duty cycle 99% (91h) 

Mass 10 kg 


T=17K 


Te11ik 


closed 
open 


3He 
3He-“*He 
“He 


sponge 


superleak 


T = 0.05 K 


Schematic of the closed-cycle ?He—*He dilution refrigerator.*° See electronic edition for 


Comparisons of three low-temperature cooling systems for space use. 


Sorption + ADR 
(SPICA SAFARI) 
EMb, 34 


Closed-cycle dilution 
BBMpP:44 


0.44W (50mKk), 144.W 
(0.3K) 

0.053 J/37 hours 
(50 mK) and 1.86 J/37 
hours (0.3 K) 

3.5mW x 9h (1.8K) 
and 10mW x 9 hours 
(4.9K) 

113.4J (1.8K), 324J = 
(4.9 kK) 

77% (48 hours) 

5.lkg 


1uW (50 mk) 


continuous 


5.2mW (1.7K) 


100% 


*Operation with liquid helium. 3rd ADR is not included in the mass. 
bEM: Engineering Model, BBM: Bread Board Model. 
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different temperatures for the heat sink and a different cooling power, it does show 
important features for each cooler. 
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Chapter 12 


Critical Angle Transmission Grating 
Spectrometers 


David P. Huenemoerder and Ralf kK. Heilmann 


Kavli Institute for Astrophysics and Space Research 
Massachusetts Institute of Technology, Cambridge, MA 02139, USA 


High resolution soft X-ray spectroscopy is a necessary method for many astro- 
physical investigations. The Critical Angle Transmission grating is a new tech- 
nology that can provide very efficient spectrometers in the soft X-ray band with 
high resolving power. Here we discuss the astrophysical motivation, the operating 
principles, give overviews of fabrication and performance, and finally present a 
mission concept, including comparisons to existing astrophysical X-ray grating 
spectometers. 


1. Introduction 


The Critical Angle Transmission (CAT) grating is a new technology relying on total 
external reflection from periodic arrays of nanomirrors etched from silicon wafers. 
CAT gratings are blazed transmission gratings which are both efficient and enable 
high resolving power throughout the soft X-ray band. Here we will describe the 
concept from the component facets up through a sketch of an end-to-end observatory 
spectrometer system that has both high resolving power and large effective area. 
First, we will provide an overview of the scientific motivation. 


1.1. The Need for High Resolution 


The purpose of astronomical spectroscopy is to determine a source’s physical param- 
eters, such as temperature, density, chemical composition, or dynamics. This is 
best done by isolation and decomposition of spectral features, formed by indi- 
vidual atomic transitions, into their physical components, such as the intensity 
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profile vs. energy or absorption or emission strength. We also desire a broad energy 
range to cover a number of transitions from different chemical elements in differ- 
ent ionization or excitation states, since they have different regimes of sensitivity; 
a broad bandpass also allows a precise determination of the continuum emission, 
required to accurately measure features, and which is also a plasma diagnostic in 
itself. 

The term “high resolution” can be physically defined as the resolving power, 
R = E/AE = X/AX required to reach characteristic natural scales present in 
the source. Here EF is the energy, or A the wavelength, and AF (or AX) is the 
full-width at half-maximum, FWHM, of the instrument’s response to a d-function 
input. Emission lines in the 0.1-2 keV region can originate from abundant elements 
in thermal plasmas; these span a broad range in their temperatures of maximum 
emissivity, from about 1 MK (C v) to 20 MK (Si xiv). To resolve thermal Doppler 
broadening of these species, which has a Gaussian distribution (with variance kT/A, 
k being Boltzmann’s constant, A the atomic mass, and T the temperature), we 
require 2000 < R < 5000. For such hot plasmas, there is little need for exceeding 
this resolving power. However, cold to warm plasmas (lower ionization stages) of 
abundant elements, such as C, N, O, Ne, Si, and Fe also have their inner-shell 
absorption edges and lines in the soft X-ray band. Hence, we may desire to resolve 
narrow features whose widths are determined by small line-of-sight turbulent veloc- 
ities of a few 10kms~+. Or we may wish to determine Doppler shifts of emission or 
absorption lines by measuring centroids; this can typically be done to an accuracy 
of about AE/10. For these latter cases, we need R ~ 3000. 

The soft X-ray band is astrophysically important for a large variety of forefront 
research areas, such as the search for absorption by the postulated warm-to-hot 
intergalactic medium at early epochs, characterization of dynamics and composition 
of the interstellar medium in our galaxy and others, studies of coronae of stars, and 
probing the highly energetic regions around stellar black holes or active galactic 
nuclei. This band is very important because it contains the hydrogen- and helium- 
like transitions for many of the most abundant elements, as well as a large number 
of Fe L-shell lines and K-shell absorption features. A good overview of a variety 
of astrophysical problems requiring high-resolution spectroscopy in X-rays can be 
found in Ref. 1. 


1.2. The Need for High Effective Area 


It is obvious that we generally require as large an effective area (the product of 
geometric area, diffraction efficiency, and detection efficiency) as possible, since 
with more area exposures can be shorter and more objects can be observed in a 
given time. Generally, area can be traded for exposure time. However, any time- 
critical phenomena (transients, phase modulated flux) might require that significant 
signal be obtained within a specified time. Hence, we need gratings, as well as other 
components, to be as efficient as possible. 
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1.3. Advantages and Limitations 


Transmission gratings have many advantages for use in X-ray observatory spec- 
trometer systems. Foremost, they are thin and thus have low mass and can be used 
to tile a large fraction of the telescope aperture* with little impact on observatory 
mass or thermal constraints. The typical geometry of an objective grating system 
takes up little space in the optical path, except perhaps for insertion mechanisms. 
While mutual alignment of the dispersion direction (rotation of a facet about its 
normal) of multiple facets is critical, the dispersion scale is rather insensitive to 
tilts of the facet normal. A diffraction grating also provides resolving power that 
increases with wavelength, having roughly constant resolution in wavelength, in 
contrast to non-dispersive spectrometers, such as silicon detectors or calorimeters, 
which have a nearly constant resolution in energy. Transmission gratings can be 
made relatively transparent at high energies (> 2 keV), and so can be multiplexed 
with high energy imagers or spectrometers that provide higher resolving power at 
shorter wavelengths. 

A primary limitation when used as objective gratings (which is the only viable 
design for an X-ray observatory), is that the source geometry is convolved with 
the dispersion; hence, grating spectroscopy of extended sources is difficult, since 
dispersed images of the source in nearby wavelengths overlap. At high energies 
(2 2keV), it is difficult to achieve high dispersion and high efficiency; in this 
regime, non-dispersive imaging detectors are more appropriate. Another limitation 
of previous transmission gratings is that they are not strongly blazed; while grat- 
ing parameters can be somewhat optimized for broad energy-sensitivity, photons 
are predominantly diffracted into low-order positive and negative orders, limiting 
resolving power, or into the zeroth order, which is of limited spectroscopic use. 


1.4. Heritage 


Objective transmission grating spectrometers have been used for soft X-ray spec- 
troscopy on Einstein,? EXOSAT® and on Chandra, in the High Energy* and Low 
Energy” Transmission Grating Spectrometers (HETGS, LETGS), with resolving pow- 
ers up to 1000 and effective areas up to 100 cm?. 


2. Transmission Grating Operating Principles 


The basic theory of Fraunhofer diffraction by a periodic structure can be found in an 
optics textbook, such as Ref. 6, and is the basis for understanding diffraction grat- 
ings. Reference 7 gave an excellent summary of diffraction by rectangular bars, first 
described in detail by Ref. 8; the former authors also provided a generalization to 
a specific case of non-rectangular profile bars. They also discussed the optimization 


“For instance, in the case of Chandra, a few hundred grating facets are arranged on a plate inserted 
behind the mirror, but this is still referred to as an objective grating. 
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of efficiency vs. order caused by utilizing phase shifts of radiation passing through 
the diffracting bars interfering with radiation passing between the bars. 

Insight into the scales involved can immediately be obtained from the condition 
for constructive interference from normal incidence on a transmission grating: 


mA = psinG 
= pr/F, (1) 


where m is the diffraction order, \ the photon wavelength, p the grating period, 
and ( is the diffraction angle. In the second form, we equate the angle to the 
linear distance from focus along the dispersion, r, at some focal length, F. To 
determine a scale for p in the X-ray situation, we can adopt a nominal focal length 
of F = 500cm (typical order of magnitude for a soft X-ray telescope), a dispersion 
distance of r = 3cm (a minimal value needed to oversample a spectrum at the 
required resolution with reasonably sized detector pixels) and a photon energy of 
1keV (12.4A). We find that p ~ 200nm. Hence, we must be able to fabricate 
uniform periodic structures at a very small scale. 


2.1. CAT Gratings 


Most traditional X-ray transmission gratings can be described as phase-shifting 
gratings: photons transmitted through grating bars experience a phase shift relative 
to photons transmitted through the gaps between bars. The resulting interference 
shifts transmitted flux from the straight-through zeroth-order beam into non-zero 
diffraction orders. However, many X-rays get lost through absorption in the grating 
bars, which often take up half of the grating period (i.e. 50% duty cycle), the 
phase shift is wavelength dependent, and sometimes supporting membranes need to 
be employed that lead to additional absorption. Blazing can be achieved through 
rotation of the grating in the plane of diffraction, but is very inefficient for phase 
shifting X-ray transmission gratings. 

CAT gratings are blazed transmission gratings that optimize constructive inter- 
ference in the direction of specular reflection? from the grating bar sidewalls. The 
design can be understood as a periodic array of very thin free-standing mirrors 
(thickness b), inclined relative to the direction of incident X-rays by an angle a 
that is less than the critical angle for total external reflection, @., for the given 
X-ray wavelength (see Fig. 1). The goal is for every photon incident in the space 
between two adjacent mirrors (separated by distance a) to undergo one (and only 
one) reflection. This results in a requirement for the grating depth d = a/ tana. To 
minimize absorption we require b/p < 1. Since 0. < 2°, the aspect-ratio is large: 
d/b = 100. In order for the grating bar sidewalls to act as efficient mirrors they 
must have roughness of S 1 nm (root-mean-square deviations). 


bThe term “reflection” is used here only for heuristic purposes; for an accurate physical description, 
we must consider the interference effects of a periodic array of structures. 
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Fig. 1. Schematic cross-section through a CAT grating. The order-m diffraction occurs at an 
angle 3 where the path length difference between AA’ and BB’ is m\. The case shown has (3 
coincident with the direction of specular reflection from the grating bar sidewalls (8m = a), the 
blaze angle for order m. (See electronic edition for a color version of this figure.) 


A simple model based on the Fraunhofer limit of scalar Kirchhoff diffraction 
theory describes the basic physics of CAT gratings.® 1° In this model the transmitted 
diffraction intensity at an angle (@ relative to the grating normal is given by 


I(A,p, Qa, B, k, a, R) = grail ttle, n(A))(a/p)?. (2) 
The grating interference function describes the angles where diffraction peaks and 
is given by 
sin kg ? 
I ra’ 11 A ik = . 
eot(Asps4 By) = | ERE (3) 


where g = p(7/A)(sin GB — sina), and k is the number of illuminated grating slits. 
For k — 00, Iprat becomes a series of 6 functions at integer arguments, and yields 
the well-known grating equation defining the positions of the intensity peaks, 


m = sina — sin Bm; (4) 
Pp 


where m is the order of diffraction (m = 0,+1,+2,...) and @,, is the angle of 
diffraction for order m. 
Blazing and diffraction from a single slit is described by 


sin f . 
Tait (A, a, 8, a) = f ’ (5) 
where f = —a(7/A)(sinf + sina). R is the specular reflectivity of the grating 


material of index of refraction n(\) at grazing angle of incidence a. 
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This model gives a good qualitative description of CAT grating diffraction. For 
example, the term J,);, describes how the blaze envelope shifts with the specular 
reflection direction if the grating is rotated by an angle y relative to the incident 
beam in the plane of diffraction, while Jsra_ shows that the angles of low-index 
transmitted orders change only by (mA/p)? (y/2).9:'! This is an important result 
for alignment purposes, since it shows that transmission gratings are 3—4 orders of 
magnitude less sensitive to figure and alignment errors in certain degrees of freedom 
than corresponding reflection gratings. 

A more accurate treatment for the computation of CAT grating diffraction eff- 
ciencies is given by rigorous coupled-wave analysis (RCWA), which calculates solu- 
tions to Maxwell’s equations for three-dimensional structures with one-dimensional 
periodicity. Such calculations predict that perfect CAT gratings made from silicon 
can provide blazed diffraction efficiencies larger than 50% over a broad soft X-ray 
band, up to a 10-fold improvement over the broadband transmission gratings used 
previously. 


2.2. Fabrication 


The above design parameters for efficient soft X-ray CAT gratings are challenging: 
small grating period of ~200 nm, small grating bar width (~40 nm) combined with 
large grating bar depth (d ~ 4-6 wm), and sub-nm-smooth sidewalls. In addition, 
the structure should not be supported by an absorbing membrane. CAT gratings 
with these desired properties have been successfully fabricated from (110)° silicon- 
on-insulator (SOI) wafers using advanced lithography and processing techniques 
similar to those found in the semiconductor and micro-electromechanical-systems 
(MEMS) industries.1? 1° The SOI wafer consists of a front side, which is a crystalline 
silicon device layer of thickness d, a thin (~500nm) buried oxide (BOX) layer that 
separates the device layer from the back side, and a ~500 ym thick silicon handle 
layer (see Fig. 2). The CAT grating bars and an integrated Level 1 (L1) support 
mesh are etched from the device layer, and a coarser hexagonal Level 2 (L2) mesh 
is etched out of the handle layer. We describe one sequence of steps for fabrication 
of a large-area, free-standing CAT grating membrane out of such an SOI wafer. 
Both sides of the wafer are covered with about 400 nm of thermal oxide, and 
the backside with a few additional microns of plasma-enhanced chemical vapor 
deposition (PECVD) oxide. The oxide layers serve as hard masks for deep reactive- 
ion etch (DRIE) steps. They are covered with multilayer stacks (front side only) and 
photoresist. The patterns for the CAT gratings and the perpendicular L1 support 
mesh (with period ~5 ym) are generated in the front side resist through interference 


°(hkl) refers to the direction of the family of parallel lattice planes with the Miller indices h, k,l, 
which are the coordinates of the shortest reciprocal lattice vector normal to these planes. Silicon 
wafers can be sliced from a boule along different crystal lattice planes and are generally specified 
by the Miller indices of their respective surfaces. 
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Fig. 2. Schematic of a grating membrane “unit cell” (not to scale), formed by a single Level 
2 support mesh hexagon. The Level 2 mesh is etched out of the SOI handle layer (back side). 
The device layer contains the fine-period CAT grating bars and in the perpendicular direction the 
coarse, low duty cycle integrated Level 1 support mesh. Device and handle layers are separated 
by the thin buried silicon oxide layer, which serves as an etch-stop for both front and back side 
etches. (See electronic edition for a color version of this figure.) 


lithography, with the CAT grating lines carefully aligned parallel to one set of 
(111) planes in the device layer. The backside resist is patterned with an L2 mesh 
(~1-2mm hexagon diameter) via contact lithography. The front side pattern is 
transferred into the thermal oxide layer generating a high aspect ratio hard mask. 
DRIE is then used to etch this pattern into the device layer, stopping on the BOX 
layer, and generating the ultra-high aspect ratio silicon CAT grating bars, suspended 
from the integrated L1 mesh. At this point the grating bar side walls are scalloped 
due to the nature of the DRIE process, rendering them too rough for reflection. 
Since the grating bar sidewalls are aligned with the (111) silicon crystal planes, a 
short wet etch in a KOH (potassium hydroxide) solution can remedy this problem, 
thinning the bars in the process.? This KOH polishing step can be performed right 
after front side DRIE, or after etching of the L2 mesh. 

The front side is then protected by resist and mounted to a carrier wafer for 
backside processing. Similar to the front side, the L2 mesh is first transferred into 
the oxide layers and then through DRIE all the way through the handle layer, 


The etch rate for (111) silicon crystal lattice planes in KOH is orders of magnitude slower than 
for other planes. This enables the creation of a large variety of highly anisotropic silicon structures 
through adept cutting and lithography. 
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Fig. 3. Large CAT grating membrane. Left: Scanning electron micrograph (SEM) of the back 
side of a grating membrane. The hexagon period is ~1mm, and the L2 mesh lines are ~100 wm 


wide. The insert shows a small area of the membrane back side at larger magnification with L1 
supports clearly visible. The much finer CAT grating bars are held in place by the L1 supports. 
Right: Grating membrane next to a U.S. quarter coin. Diffraction is due to the L1 support mesh. 
The hexagonal L2 mesh is visible due to back illumination. Most membrane defects were caused 
by intentional tearing for cross-sectional SEM studies. (See electronic edition for a color version of 
this figure.) 


stopping on the back side of the BOX layer. After dismounting the carrier wafer 
and cleaning the protective resist cover from the front side the KOH polishing step 
can be performed. Finally, an HF (hydrogen fluoride) etch removes the BOX layer 
in the areas not covered from both sides by silicon, resulting in a freestanding silicon 
CAT grating membrane (see Fig. 3). This membrane can be glued to a machined 
frame, creating a grating facet. 


2.3. Performance 


Fabrication methods have evolved through several generations, as prototype CAT 
gratings were found to lack one or more of the desired properties. Many factors 
contribute to the throughput of a grating spectrometer. Diffraction efficiency is 
only one such factor. Another, which must be minimized, is blockage by absorbing 
structures such as L1 and L2 support mesh or facet frames. CAT grating prototypes 
were originally made using only wet KOH etching, taking advantage of the strong 
anisotropy of KOH for different silicon crystal planes to generate smooth, ultra-high 
aspect-ratio CAT grating bars.!?-4 The drawback of this approach was that the 
mask for L1 supports protected inclined (111) planes from etching in KOH, so that 
L1 supports broadened with increasing etch depth and ended up blocking too much 
of the available grating area. 

These wet-etched prototypes, however, did suffice to experimentally demon- 
strate the CAT grating principle: diffraction efficiencies behaved qualitatively as 
expected as a function of period, etch depth, incidence angles, and wavelengths, 
and observed diffraction efficiencies were in the range of 50-100% of theoretical 
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Fig. 4. Theoretical diffraction efficiency of a 200nm period, 6m deep, silicon CAT grating 
derived with RCWA. The upper curve shows the sum of diffraction efficiencies from orders 1-12, 
which are shown individually below. The dashed line shows zeroth-order transmission, which con- 
tributes to effective area at the telescope focus and could be used for higher energy imaging. 
Blockage from L1 and L2 supports is not taken into account. (Adapted from Ref. 21. See electronic 
edition for a color version of this figure.) 


predictions over a broad range of wavelengths.!° Figure 4 shows the theoretical 
diffraction efficiency of a CAT grating. 

To reduce the blockage by support structures, DRIE steps were introduced 
into the fabrication process. DRIE allows the transformation of 2D patterns into 
high aspect-ratio 3D structures without broadening; its function is irrespective of 
silicon crystal orientation. CAT gratings with d = 4 wm have been fabricated using a 
combination of DRIE and KOH etching, achieving > 30% peak absolute diffraction 
efficiency (including L1 absorption) near \ = 2.5 nm, which corresponds to ~85% of 
the RCWA prediction for the achieved geometry.” Silicon CAT gratings have also 
been covered with conformal thin layers of platinum using atomic layer deposition. 
The Pt layer increases the critical angle for total external reflection and allows 
blazing to larger angles and thus higher diffraction orders, and extends the bandpass 
towards higher energies. Pt-coated CAT gratings have been used in a spectrometer 
setup to blaze in 18th order at Al Ka wavelengths to demonstrate resolving power 
in excess of 10,000.29 Current research focuses on the fabrication of deeper CAT 
gratings, minimization of L1 and L2 supports, and the increase in grating facet area 
beyond ~10 cm?. 

As one can see from the grating equation (see Eq. (4)), two wavelengths 4 
and Az overlap in neighboring orders when mA; = (m+ 1)dz2, and their energy 
difference is AE = Ey — E, = +E,/m. The readout detector for the diffracted spec- 
trum must be capable of differentiating between photons of wavelengths A; and Ag. 
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Fig. 5. Example of order separation by the non-dispersive, low-resolution CCD detectors, based 
on ray-trace simulations of the AXGIS mission concept.?? Each dark band is a spectral order (labeled 
at the left) for a flat-spectrum source. The horizontal axis is the high-resolution dispersive direction 
and the vertical axis is the detector’s low-resolution non-dispersive signal (scaled to an equivalent 
energy). The faint vertical light stripes are gaps between the detector array elements. 


For example, a 200 nm period grating with a+ = 3.0° leads to AF’ ~ 120 eV, which 
is large enough to allow order-sorting for state-of-the-art X-ray charge-coupled- 
device (CCD) detectors.® Figure 5 shows a simulated distribution of multiple spa- 
tially overlapping orders separated by the non-dispersive CCD response. 


3. Mission Concept 


Here we describe an orbital observatory concept for a dedicated CAT-based X-ray 
spectrometer. This is similar to the AKGIS design in Ref. 22.1 We will assume cer- 
tain detector and mirror performance characteristics, but will describe the grating 
spectrometer in detail. 


3.1. X-ray Mirror Assembly 


X-ray telescopes require reflecting grazing-incidence mirrors as focusing optics, and 
there are several relevant technologies for large apertures currently in develop- 
ment: slumped-glass mirrors, silicon-pore optics, monocrystalline silicon mirrors, 
adjustable mirrors, and full shell optics.* all in a Wolter Type-I configuration 


°See Chapter 8 of this volume. 

fThis concept was also submitted as a white paper in response to a NASA Request for Information, 
available at http://pcos.gsfc.nasa.gov/studies/rfi/Bautz-Marshall- RFINNH11ZDA018L.pdf 

See Chapters 2 and 3 of this volume. 
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(co-axial, con-focal paraboloid-hyperboloid mirror pairs)."»?° In most cases, the full 
aperture of the focusing mirrors is assembled from radial and azimuthal sectors. The 
critical parameters are the range in graze angles, which determines the bandpass 
and efficiency (smaller graze angles reflect harder photons with greater efficiency, 
but have smaller projected geometric area) and the imaging quality (determined by 
surface figure and alignment). We will assume that we can obtain a mirror of 1.9m 
diameter aperture and a focal-length of 4.4m with an effective area of 1400 cm? 
over the band 0.2-1.0 keV. 

A very important consideration in spectrometer design is that the mirror scat- 
tering for grazing incidence is predominantly in the plane of specular reflection. 
Consequently, scatter from a mirror sub-aperture limited to, say 30° in azimuth, will 
be comparable in angular spread to scatter from the full aperture in the direction of 
specular reflection, but much narrower in the perpendicular direction. By orienting 
the grating dispersion direction along the narrow scatter direction the width of the 
spectral line spread function is minimized and resolving power is increased by a 
factor of several. In Fig. 6, we show results from a ray-trace simulation of a typical 
mirror for the full aperture vs. a diametrically opposed pair of mirror modules. 
The gain in resolving power is large, about a factor of 5 in this example. This 
effect has been demonstrated experimentally for a single CAT grating, resulting 
in R > 10,000.79 Hence, a viable CAT grating spectrometer will have modules 
defined over limited azimuth ranges, with a readout for each. A full mirror system 
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Fig. 6. Ray-traced spot diagrams showing the patterns for a full aperture (a), and for a 30° 
sub-aperture (b) from a pair of opposing mirror modules. This demonstrates the gain in spectral 
resolution afforded by using sub-apertures, since when dispersed in the horizontal direction (Y 
axis), the distribution in panel b provides a much narrower distribution. (Adapted from Ref. 24.) 


hSee Chapter 1 of this volume. 
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would require about six readout systems, as in the case of /AGIS. However, if full 
aperture imaging is not required, several modules could be arranged in a compact 
non-circular geometry. 


3.2. Rowland Geometry 


The Rowland torus is a well-defined geometry for the placement of transmission 
grating facets. If we first consider ideal infinitesimal facets and place them on a circle 
with the incident beam normal to their surface element and their dispersion axis in 
the plane of the circle, then any given wavelength dispersing at angle @ converges 
to a single point on the circle. This is called the Rowland geometry, and is shown in 
Fig. 7. In three dimensions, we can construct additional Rowland circles by pivoting 
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Fig. 7. Rowland geometry for an idealized system. The “Top View” shows the Rowland circle 
holding the axial ray. Monochromatic rays normal to facets on the Rowland circle disperse to a 
common point. In the “Side View” (having rotated the “Top View” by 90° about the optical axis), 
we see the outer arc of a Rowland torus and on three rays, see three facets each in projection on 
tilted Rowland circles. 
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the circle with the axial ray about the tangent to the on-axis focal point to define 
a Rowland torus; gratings on this torus will also diffract to a common astigmatic 
line in the cross-dispersion direction; the spectral focus is slightly ahead of imaging 
focus, so the beam has converged in the dispersion direction, but is still converging 
in the cross-dispersion direction. The “Side View” of the Rowland Geometry is also 
shown schematically in Fig. 7. 

(For an excellent and more rigorous discussion of the Rowland Circle and rele- 
vance to X-ray spectrometers, see Ref. 1.) 


3.3. Grating Modules 


To provide as much effective area as possible, we need to tile the area behind the 
mirrors, in the converging beam, with grating facets, carefully aligned and placed 
on the surface of the Rowland torus. This will be achieved by mounting the facets 
to a set of grating array structures (GAS). GAS and mirror module design can be 
optimized to minimize blockage from structural parts and maximize the open area 
for grating facets. Such assemblies are also lightweight and can be made with low 
geometrical obscuration. 

Facets on diametrically opposed mirror segments will be aligned to share a 
detector readout array. Since detector arrays typically have gaps, and since grating 
efficiency curves are strongly wavelength dependent, at least two slightly different 
periods of gratings can be used to smooth the response with wavelength by mak- 
ing detector gaps or efficiency roll-offs occur at different wavelengths in different 
modules. 

Figure 8 shows a schematic mirror-grating module assembly. In this diagram, 
rays are sketched for the center of one module pair, with the beam traversing from 
right to left and dispersing upwards; the direction of the dominant mirror scattering 
is perpendicular to the dispersion direction. 


3.4. Aberrations, Mitigation 


To implement the blazing of CAT gratings, the grating bars must be tilted relative to 
the facet normal. However, this is a non-trivial fabrication problem. An alternative 
is to etch the grating bars normally to the facet surface and to then tilt the facets 
themselves by the desired blaze angle. This, though, increases an existing aberration 
due to the finite facet size: a planar facet does not follow the Rowland circle (or 
torus), but will have portions slightly ahead of or behind the ideal surface (see the 
top portion of Fig. 7). Hence, photons will diffract from points slightly ahead of 
or behind the optimum position, and this will cause a spread in the line image 
in the dispersion direction. This finite facet size effect increases for larger facets, 
and is compounded due to the tilt introduced to achieve blazing, if one keeps the 
simple Rowland geometry. This aberration can be mitigated by slightly altering the 
geometry by offsetting the Rowland circle, and by defining the Rowland torus by 
pivoting the circle about a different axis. This is shown schematically in Fig. 9. 
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Grating Modules : 


—— Zeroth Order CCD Mirror Shells 


Fig. 8. CAT observatory spectrometer concept, showing arrangement of grating modules behind 
the mirror, dispersion from one pair of modules, and the positions of the zeroth order and the 
dispersed spectrum. Using diametrically opposed mirror pairs helps maximize spectral resolving 
power, since the dominant mirror scattering is designed to be perpendicular to the dispersion 
direction (and so would scatter roughly in the horizontal direction in this schematic). (Adapted 
from Ref. 24. See electronic edition for a color version of this figure.) 
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Fig. 9. Tilted Rowland geometry, which compensates for the finite-facet size aberration intro- 
duced by tilting facets to achieve blazing. (Adapted from Ref. 21. See electronic edition for a color 
version of this figure.) 
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The finite facet size aberration can also be reduced by introducing an appro- 
priate period gradient across a grating (“chirping”). In this way, a photon being 
diffracted before reaching the Rowland circle could see a lower period, have a slightly 
smaller diffraction angle, and be confocal with rays diffracting from the Rowland 
circle. 

Astigmatism, the defocus in the imaging direction, grows quadratically with 
diffraction angle. This is generally not a problem for single point sources. It will 
incur an increase in background due to the larger area over which a spectrum is 
collected, but background is expected to be quite small. 

Blur can also be caused by grating mis-alignments and period variations within 
or between facets. To ensure that resolving power is mainly limited by the mirrors, 
we wish to have the period errors, dP/P, and facet rotations well below the mirror 
blur. 


3.5. Detector Readout System 


For a detector array, we need pixels small enough to oversample the line-spread- 
function (LSF), and we need sufficient energy resolution by the detector to separate 
spatially overlapping orders from X-rays of differing wavelengths. Both requirements 
can be met by CCD detectors.' and such have been used on the Chandra and 
XMM grating spectrometers. To separate adjacent orders, we require that the CCD 
resolving power, E/AF, be greater than some nominal threshold, k’, to reject the 
main peak’s tails. This depends on the details of the low-channel tail of the detector 
redistribution function (for CCD responses, the low channel tail is typically extended 
while the high side is very Gaussian). Assuming for argument that the response 
is primarily Gaussian, and if we wish that k’ = 5, then we require that AF be 
about 100eV at 0.5keV, which is easily obtainable with existing CCD technology. 
For example, the Chandra CCDs have AE ~ 50eV at 0.5keV. See Fig. 5 for an 
example. 

It takes several CCD detector elements to tile the region covered by a blazed dis- 
persed spectrum. The elements are placed along the Rowland circle. Gaps between 
detector elements imply gaps in wavelength coverage. As mentioned earlier, these 
gaps can be filled in by use of two slightly different grating periods used on different 
mirror modules, so that coverage is complete when data are combined from differ- 
ent modules. A second technique is operational, and has been used by Chandra; the 
observatory pointing is dithered during each observation to average the response 
over detector non-uniformities. Accurate knowledge of the aspect with time allows 
precise post facto reconstruction of the sky and wavelength coordinates. 

For accurate absolute wavelength determinations, we require an image of the 
zeroth order, which is the origin of the dispersion relation. Since gratings are blazed, 
we do not have the option of bisecting features seen in positive and negative orders. 


iSee Chapter 8 of this volume. 
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Fig. 10. Representative effective areas and resolving power for existing and conceptual missions. 
Those using CAT gratings are Lynz, NGIS, and Arcus. The XRISM- Resolve spectrometer uses a 
calorimeter. The XMM-Newton spectrometer uses reflection gratings, and Chandra transmission 
gratings. (See the electronic edition for a color version of this figure.) 


Given that we are using the blaze function to give highest efficiency in high orders 
and correspondingly large angles, we require an additional detector element to be 
placed at the zeroth order. Thus we can obtain an instantaneous determination from 
the observation of the origin for the dispersion, but also a valuable image of the 
field and imaging-spectra in the higher energy band transmitted by the gratings. 
This can provide important secondary science as well as information necessary for 
interpreting the dispersed spectra of crowded fields or of mildly extended sources. 
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3.6. Overall Performance 


Using nominal, but realistic, values for mirror efficiencies and imaging quality, for 
grating efficiencies and aberrations, and for detector quantum efficiency, we can 
derive overall system effective areas and resolving powers. This has been done in 
response to NASA requests for information for future missions. In Fig. 10, we show 
effective areas and resolving powers for the CAT grating spectrometer of the GIS 
mission concept, along with metrics for a few other real or conceptual instruments. 
Recently, the Arcus?*:*° soft X-ray grating spectrometer was selected for a NASA 
Phase A Medium Class Explorer study; it is based on silicon pore optics, CAT 
gratings, CCDs, and a double tilted-Rowland-torus design.?’ In preparation for the 
NASA 2020 Astrophysics Decadal Review a much more ambitious X-ray mission 
called Lynzx?®:?° features a soft X-ray grating spectrometer with 4000 cm? effective 
area and R > 5000 over the 0.2 to 2keV band as one of its instruments. With 
some reasonable investment in future CAT grating technology development such an 
instrument could be realized in the near future.2® We can see that in the soft X-ray 
band, such dedicated CAT grating spectrometers would far out-perform currently 
operating and near-term observatories. Not only would this allow new and exciting 
astrophysics, but the increased sensitivities would provide higher time resolution 
of transient phenomena and the ability to perform broad and deep spectroscopic 
surveys. 

All the relevant technologies are approaching readiness for flight and we hope 
that opportunities will be forthcoming in the next decade. 
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X-ray reflection gratings placed in the off-plane mount are capable of achieving 
high diffraction efficiency and high spectral resolving power, the two chief assets 
of a grating spectrometer. Developments in this technology have been accelerated 
in recent years and new fabrication methodologies are being employed to create 
a new generation of reflection gratings. Here, we discuss the theory of diffraction, 
and how it pertains to off-plane gratings. This is followed by a discussion of 
considerations that are relevant when designing an off-plane grating spectrome- 
ter. Empirical results from the measurement of diffraction efficiency and spectral 
resolving power are also presented to motivate a discussion of the incorporation 
of these gratings into future space-based X-ray spectrometers. 


1. Introduction 


X-ray reflection gratings can be mounted so that incident light is orthogonal to the 
groove direction, typically referred to as the in-plane mount, where the diffracted 
light lies in the plane of incidence defined by the incoming ray and the grating 
normal. However, a plane wave of light intersecting a reflection grating in a direc- 
tion parallel or nearly parallel to the direction of the grooves still experiences a 
periodic reflective surface, thus resulting in a far-field diffraction pattern. In this 
case, diffraction does not occur in the plane of incidence but rather orthogonal to 
it, thus leading to the common name, off-plane diffraction. This scenario is depicted 
in Fig. 1. The sketch depicts light intersecting the grating at an angle y, which is 


253 


254 R. L. McEntaffer & C. T. DeRoo 


Fig. 1. The geometry of the off-plane mount. Light is incident on the grating nearly parallel to 
the groove direction with graze angle y. Diffracted light follows the generalized grating equation 
and lies along an arc at the focal plane a distance Lcos(y) from the grating. Reproduced from 
Ref. 1 with permission. 


the half-cone opening angle between the incident beam and the groove direction. 
The incident beam is not exactly parallel to the groove direction, so that specularly 
reflected light is imaged at the focal plane at an angle of a, relative to the projection 
of the grating normal onto this plane. Dispersed light obeys the generalized grating 
equation, 


nr 


sina+sin@ = ia 


(1) 


which now includes a term for the graze angle in order to satisfy the geometry for 
interference. For a photon with a given wavelength A, intercepting a grating with 
groove spacing d, a given diffraction order n will occur at angle ( relative to the 
normal. Given the out-of-plane dispersion direction and the grazing incidence, the 
resulting spectral focus lies along an arc defined by the cone half-angle y. For this 
reason, the geometry is also known as conical diffraction. 

We begin by defining the theory describing off-plane diffraction (Sec. 1.1). This 
is followed by a discussion of practical considerations for designing an off-plane 
spectrometer (Sec. 1.2), along with a description of the main challenges in realizing 
these spectrometers (Sec. 1.3). The chapter continues with a discussion of recent 


Off-plane X-ray Gratings 255 


advances in off-plane grating fabrication (Sec. 2), the latest empirical results show- 
casing off-plane grating performance capabilities (Sec. 3), and a discussion of grating 
alignment (Sec. 4). 


1.1. Off-plane Diffraction Theory 


Diffraction from a grating is produced by a plane wave reflecting off the many 
groove facets. Each facet becomes a source and the far-field illumination can be 
thought of as a summation of these sources, which are subject to phase differences, 
thus resulting in a pattern resembling multi-slit diffraction due to constructive and 
destructive interference. The spacing evident in this pattern is dependent on the 
spacing of the groove facets, while the intensity distribution is dependent on the 
groove shape.” The requirement for constructive interference can be visualized in 
Fig. 2. The total path length difference between rays of a given wavelength from 
adjacent grooves, AS, must equal an integer number of wavelengths, nA, where 
n € Z. As evident from the figure, As; and Asg both contribute to this total 
path length difference since As, represents the path length difference between rays 
prior to intersecting the gratings and As2 represents the path length difference after 
the grating. The scenario in Fig. 2 appears as the in-plane configuration, but can 
be generalized to the off-plane configuration given that projection of the incident 
rays into the plane orthogonal to the groove direction would look the same for the 
off-plane case. The third dimension is recovered by noticing from Fig. 1 that the 


As,=dsing As,=dsin B As = As,— As, 


Fig. 2. Geometry determining path length differences between rays diffracting off neighboring 
facets. The total path length difference should be an integer multiple of the wavelength for the 
condition of constructive interference. (See electronic edition for a color version of this figure.) 
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off-plane path length differences are different from the in-plane case by a factor of 
sin y, due to the projection of the wave vector, k, into the plane of Fig. 2, This leads 
to the following equation: 


(As, — Asg)siny = (dsina — dsin Z) siny = nd, (2) 


for incident and diffractions angles, a and (3, defined according to Fig. 2 (both pos- 
itive). This is more commonly expressed as the off-plane grating equation (Eq. (1)) 
when using the sign convention that negative diffraction angles are on the opposite 
side of normal as the incident light. 

For a given groove spacing and facet shape, the efficiency of each diffraction 
order can be calculated theoretically to predict the illumination function at the 
image plane. These efficiencies are critical for understanding the effective area of a 
grating spectrometer and must therefore be assessed during the design phase. Empir- 
ical measurements are then needed to verify the design’s capability for achieving the 
desired performance requirements (see Sec. 3.1). Calculating efficiencies for a given 
geometry is complex and can seldom be accomplished in closed form. In general, 
the calculation involves utilizing Maxwell’s equations to solve the outgoing E and 
B fields for a given set of boundary conditions set by the grating geometry. From 
these fields, the intensity at the image plane can be calculated as the time-averaged 
square of the E field at a given position, normalized by the amplitude of the incident 
wave. 

An introductory derivation can be formulated from a description of derivations 
found in Goray and Schmidt (2010).° Let a, 3, and y be defined as the projections of 
the incoming wave vector, ko, on the z-, y-, and z-axes, respectively. The incoming 
plane wave can then be written as 


Et = pelor—ut12) At = getlor—Buty2) (3) 


where p and s are the polarization and amplitude of the incident wave. As shown 
in Fig. 3, 


a + P49? = kp =wrep uy, (4) 
where 
(a, —3,7) = w/ez 4 (sin 6 cos ¢, — cos A cos ¢, sin d). (5) 


The region above the grating surface is denoted with the + subscript with the 
region below the surface is denoted with a — subscript, which is important for the 
boundary conditions. 

Assuming that the grooves are constant in the z direction, we start with the 
ansatz that the magnitude of the field is only dependent on x and y, with z 
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0 d x 


Fig. 3. A figure explaining the conditions for the derivation of off-plane diffraction theory. Top: 
An electromagnetic wave is incident on a grating in the off-plane geometry to produce an arc of 
diffraction. The displayed coordinate system is consistent with that used in this section. Bottom: 
Projection of the electromagnetic wave into the zy-plane. The grating surface is with the region 
above (containing the incoming and outgoing waves) denoted as G+ and the region below as G_ 
Reproduced from Ref. 3 with permission. 


dependence in the phase: 
E, H(z, y, z) = E, H(2, ye. (6) 
Beginning with Maxwell’s equations in free space, 
VxE=iwuH, V x H= —iweb, (7) 
the fields can be expressed by their z and transverse components: 


B=Er+k,, H=Hr+ Hz, 
Ren-be Hear he 
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The vector components must piecewise-match, leading to the following expression 
for E: 


_ (OE, OEF,\, OE, OE.\ . OE, OE,\ . 
vx B= (SS) oy (SE) 94 (Se) 


= iwu (A,é + Hyy + H.2), (9) 


with a similar expression for B. Using Eq. (5), we can evaluate the derivatives with 
respect to 2: 


E 
Oz Oz (10) 
OH, OH. : 
ao iyH,z, —*= iyHy 


An additional exercise of piecewise-matching field components yields the 
following six linear equations: 


: OE, _ (OH, 

iwiHy, = (-ine, + By ) , wel, = ( By - int) ; 

; ; OE, OH, |. 

iwuHy, = (-ine. mee ), —iweEy = (-= + int), (11) 
OE OE OH, OH, 

; H, =f Pm © _; oe _ yo x 

— ( dx dy ) me ( dx dy ) 


which can be used to solve for Fz, Hz, in terms of F,, H,: 


a] 


i OE. re OH, H i OH, OE. 
_ & , Ay = ——= | y—— —- we ; 
wept — 2 "Ox i Oy wep — ¥2 TO Oy 
E i OE; OH, i OH, 2 OE. 
= —— —_ — ; a = WE . 
¥ wep — y? 7 Oy Pony? owe — y? q Oy Ox 


(12) 


Given that the transverse component of the fields is represented by the x and 
y components, the transverse field can be expressed in terms of the z component: 


1 
Ep = =——s (iV Ex + iwpV x H,2), 
rete 
1 (13) 
Hy = —— (i7VHz + iwuVxE,2). 
wen — ¥ 


Therefore, finding the z component of the field also solves for the transverse 
component. Some insight into the z component can be found by taking the curl of 
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Maxwell’s equation, 
Vx Vx E=o°ueE = V(V-E)—V°E 
= —E(2,y)V’e' — 2(Ve - V)E(a, y) — eV? E(x, y) 
= E(x, ye” — iye’*(2-V)E(z,y) — eV V7 E(a,y). (14) 


The second term in the last expression equals zero, as does the gradient of 
the divergence of E, given there are no additional sources, thus simplifying the 
expression to a modified version of the Helmholtz equation: 


V7E(a, y) + (wpe — 7°) E(a, y) = 0. (15) 


For ease of notation we can define the functions 
2 


Ke = e4p ss 
Ww 
= €4M+ — €4p4 Sin p (16) 


so that wep — y? = wK4. 


In order to find a solution, boundary conditions must first be placed on the 
fields. The boundary conditions in the components perpendicular to the normal (7) 
of the grating surface } are 


fAx(Et — E-) = [Ax E]y =0 
ax(H*t — H-) = [AaxH]y =0, 


given a source-free region. The +/— superscripts indicate whether the fields are 
located in the G, or G_ regions as shown in Fig. 3. The square brackets are used 
to indicate a jump in the perpendicular components across the surface. 

The z components of the fields yield 


(17) 
with the transverse field giving the perpendicular boundary conditions: 


fed? 0 .O .O 
nh x [Er] = (n2eé+ ny¥) x is G +15) E, + iwp Gr — ix) H| 


in (poet — supe) — mn (in OB + ten OH 
= |e wert ase Hy \ 2a ee Dy e 


; O 0 . O 0 
= in (ns _ ny) EL, — wy GG. + mys) | 
= [i70,B, — iwpo,Hz|s 
=0, (18) 
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where 0, is the normal derivative on the surface (nzO/Oz + nyO/Oy) and O; is the 
tangential derivative (n,0/Oy — nyO/Ox). The conditions for H, and Hr can be 
arrived at similarly, 


[Hz], = 0, (19) 
[yOH, + wed, Ez], = 0. (20) 


The solution should account for the periodic nature of the groove geometry in 
the x direction so that the geometry of the problem is conserved when translated 
in this dimension: 


E, (x + d,y) =E, (x,y) en. H, (a + d,y) =H, (x,y) ae (21) 


where d is the groove spacing. Given the natural periodicity of the problem, a 
Fourier series solution in x can now be sought with coefficients found by applying 
the boundary conditions. Given that B, = \/u+/e+H-, the total field in G looks 
like 


(ES, BY) (,y) = (Ez, Bz) (@,y) + Yilugt, uf etlane tery), (22) 
neZ 
where (a? + 62) = w?«+ in order to satisfy the modified Helmholtz equation. The 
periodicity also exists in a, such that a, = a+ 2mn/d and BF = ,/w?k4 — a2. 
Similarly, the total field in G_ is 


(Ez, Bz) (@,y) = D> (uz, vq )etlane ten), (23) 
neZ 


where 3, = ,/w2K? — a2. From Eq. (21) it can be seen that for values of a, such 
that w?K?.—a? < 0, the field is exponentially damped, thus leading to the condition 
of evanescent orders. 

Given the definitions of the field in terms of uz, , the boundary conditions 
and Helmholtz equation can also be expressed in these terms: 


Un 


Vv (Ue Up ) + wy (ux, un ) > 0, (24) 
(tis Ue) = (Ez, Bt) a (aise | =0, (25) 
€-Onu, €+0n(EL tut) _ 1 
ee = — eo = sin @ (= — =) OLUy, 5 (26) 
Onur On (Bt + uf . 1 1 = 
Haber — Hn TE) uy sing (Se — e) day (27) 
Kt RA A Ko 


as shown in Ref. 3. 
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Using Green’s theorem, an integral expression for u> and vj; can be used to 
rewrite the modified Helmholtz equation, given the Green’s function, U(ay), which 


has the property 


n 


VU 4+ wk U = —d(2, y) (28) 


by definition. As an example, considering u,, in the domain of Gy yields the following 
expressions: 


I, (ut V?U — UV7ut) da = [ (ut VU — UVut) - ade, 
I, uy (—w?6 — 6(a,y)) — UV? ut da 
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Integral equations can be constructed in this way to solve for u 


7, and v7, subject 
to the boundary conditions in Eqs. (25)-(27). Given a specific set of conditions 
(€4, Ut, w, incidence angle, etc.), numerical methods are then employed to solve 
for the coefficients, which ultimately determine the amplitudes of the diffracted 
waves for wavelength \ = cw/2z and order n. Therefore, the theoretical diffraction 
efficiencies are the time-averaged square of this result, normalized to the incident 


intensity. This treatment gives a complete solution to the expectation for diffraction 


n? 


efficiencies; however, it is difficult to implement, especially if a study requires a 
range of potential grating geometries and groove profiles. Therefore, commercial 
software, such as PCGrate.* is typically employed to model theoretical predictions 
of diffraction efficiency for a given grating. An example of this application is given 
in Sec. 3.1. 


1.2. Off-plane Spectrometer Design Considerations 


When designing an off-plane grating spectrometer, the ultimate goal is achieving 
the specified performance requirements for effective area and spectral resolving 
power. These requirements are driven by science goals and vary depending on factors 
such as the physical/monetary scale of the mission and quality of the spectrometer 


“See www.pcgrate.com for more information and references for this software. 
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components. For larger scale missions that are successors to observatories such as 
the Chandra X-ray Observatory and XMM-Newton, the performance requirements 
placed on the gratings typically boil down to diffraction efficiencies of ~40% (sum of 
orders) and resolving powers 23000 (A/A.) for soft X-ray energies, ~0.3-2.0 keV. 
For a given spectroscopy mission definition, determining the physical scale of the 
spectrometer is one of the initial steps in design. This will depend on many factors 
including the focal length of the telescope, the area covered by the telescope (i.e. 
the radial and azimuthal spans of the optics), the available space for the gratings, 
the grating geometry, and the space allotment at the focal plane. One of the most 
useful tools to employ during such an exercise is the dispersion relation. This relates 
the physical space at the focal plane to wavelength space. One can arrive at this 
relation by first differentiating the grating equation with respect to wavelength, 


cos G63 = 


n 


Tany (29) 


Furthermore, some physical extent in the dispersion direction at the focal plane, 
dx, covers some angular extent on the arc of diffraction, 


dx = OGL sin ycos B, (30) 


where Lsiny is the radius of the arc. Solving for 63 and substituting into Eq. (28) 
yields the dispersion relation, 


or d 
ae (31) 
This is typically written as 
A/mm = 10'/(nDL), (32) 


where D is the groove density (=1/d) in grooves/mm and L is the throw in mm. 

This is a very useful equation, as it determines where diffraction for each 
wavelength occurs in the focal plane with respect to zero order in the dispersion 
direction. For example, if the groove density is 6000 grooves/mm (d ~ 167nm) 
and the focal plane is 8m from the grating, then the dispersion is 0.2 A/mm for 
first order. If the wavelength band of interest is 12-40 A (~0.3-1.0keV) then the 
spectrum will lie 60-200 mm away from zero order. It is important to note that this 
displacement is in the direction of dispersion (orthogonal to the groove direction 
and in the focal plane) and not a distance tracing the circular arc of diffraction. 
The grating geometry (@ and ¢ in Fig. 3) then gives the radius of the arc and the 
position of zero order, thus allowing for determination of the entire arc of diffraction 
at the focal plane. 

The dispersion relation also assists in determining a system’s spectral resolv- 
ing power. The shape of a spectral feature is determined by a combination of the 
intrinsic source shape, the quality of the telescope optics, and aberrations induced 
by the gratings. If the resulting line spread function at the focal plane measures 
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50m in the dispersion direction, for example, then the finest resolvable feature 
is 0.01 A, given 0.2 A/mm dispersion. The spectral resolving power (R) at a given 
wavelength is defined by that wavelength divided by the finest resolvable feature, 
R= X/Ad. Therefore, a wavelength band from 12-40A will have spectral resolv- 
ing powers of 1200-4000. Given that dispersion is independent of wavelength, AX 
remains constant with wavelength so that spectral resolving power is linear with 
wavelength. Spectral resolving power also directly scales with order (n) such that 
wavelengths from 12-40 A, in the example above, would have spectral resolving 
powers of 2400-8000 in second order, given that they are dispersed twice as far 
from zero order. 

In addition to knowing the layout of the focal plane, it is important to consider 
the practical layout of the grating array. Due to the large angle of incidence required 
for efficient reflectivity at X-ray energies, individual X-ray gratings have very limited 
collecting area given a typical grating size, ~10smm x 10smm. Therefore, many 
gratings are required to achieve large effective area in an X-ray spectrometer. If indi- 
vidual in-plane gratings are placed close together in a stack, one on top of the other, 
with spacing appropriate to their projected area and their reflective surfaces nearly 
parallel, then light that is dispersed to large angles will be occulted by the back side 
of the neighboring grating, given that diffracted light lies in the plane of incidence 
for the in-plane mount. Large dispersion angles are imperative, as they allow for high 
resolving power at long wavelengths and higher orders. Therefore, in-plane reflection 
gratings must be spaced far apart to allow passage of the diffracted light, thus miss- 
ing some of the incident light from the telescope and reducing the total effective 
area in the X-ray spectrum. Although this limits optimal packing geometry in a 
dedicated reflection grating spectrometer, the in-plane mount can be implemented 
in instruments that use part of the telescope for imaging with the sections covered 
by gratings used for spectroscopy. Nowhere is this more evident than the highly 
successful XMM-Newton Reflection Grating Spectrometer (RGS).4 However, if the 
goal is to maximize the effective area of a reflection grating spectrometer, the off- 
plane mount is advantageous. Given that the diffracted light leaves in a cone with 
half-angle defined by the small graze angle, it is easy to place these gratings very 
close to one another without occulting any diffracted light from the neighboring 
grating. In fact, the gratings can be stacked close enough together such that they 
would intercept the entirety of a telescope beam. 

Now armed with appropriate configurations for the focal plane and grating 
array, integration with the telescope can be considered (design efforts do not nec- 
essarily have to proceed in this order since it is often the case that one instrument, 
such as the telescope, is fully determined prior to the others). The quality of the 
telescope must be assessed as this is the major contribution to the spectrometer line 
spread function (LSF). The telescope’s point spread function (PSF) determines its 
focus quality, but this can be improved upon through subaperturing.’ Limiting the 
azimuthal sampling of the telescope by the gratings creates a subsampled telescope 
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PSF that mimics the azimuthal sampling of the telescope — a full-azimuth circular 
PSF will now look like an hourglass or bowtie. If oriented correctly with respect to 
the dispersion direction, this can lead to a narrowing of the LSF in the dispersion 
direction and hence an increase in spectral resolving power.® However, if the tele- 
scope PSF is approaching the resolution capabilities of the detector, then subaper- 
turing may not be practical and the LSF will be the same as the telescope PSF, in 
the absence of additional aberrations. Therefore, these considerations can be used to 
appropriately match the grating array layout to the telescope configuration. Then, 
the size of this optical assembly, along with the expected location and size of the 
focal plane, can be used to determine the size and layout of the spectrograph. Given 
this design, more detailed models of total spectrometer throughput and resolving 
power can be determined through analytical modeling and application of empirical 
results. 

Therefore, given sufficient diffraction efficiency, spectral resolving power, and 
packaging, reflection gratings placed in the off-plane mount are viable candidates 
for X-ray observatory spectrometers. 


1.3. Off-plane Spectrometer Design Challenges 


Although off-plane gratings present many advantages for an X-ray spectrometer, 
there are many challenges involved with producing the generation of gratings that 
will be used in upcoming space missions. These challenges are evident through 
inspection of Fig. 4.° In this figure, the optical axis is out of the page and the grating 
grooves are shown projected from the position of the gratings all the way down to 
the focal plane. Note that the gratings do not actually span this entire distance, but 
are depicted in this way for illustrative purposes. Furthermore, optimum effective 
areas are obtained by densely packing the array of gratings. This can typically 
require 10s to 100s of gratings, yet only three representative gratings of this array 
are shown for simplicity. If telescope light were allowed to pass through the gratings 
unimpeded, then it would focus at the spot labeled “Telescope focus”. However, 
the gratings intersect this beam and specularly reflect it into zero order at the focal 
plane, which also contains the arc of diffraction. Since zero order is not at the top 
of the arc, the incident light is at a slight angle relative to the groove direction, 
hence leading to a zero order a at the focal plane. The diffracted light lies along 
the arc and would be collected by an X-ray detector such as a CCD camera. Small 
squares are shown to depict a number of CCDs that would be required to sample 
the desired spectrum. 

The first challenge depicted in this figure is the orientation of the grooves, which 
places constraints on fabrication methods. The projection of the grooves shows them 
converging to a single point known as the hub. This convergence matches that of 
the telescope beam, thus maintaining a constant a across the grating and there- 
fore a constant (3 per diffracted wavelength. This limits grating-induced aberrations 
and ensures that the spectral morphologies match that of the zero order image. 
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Fig. 4. A diagram of an idealized array of three gratings. For illustrative purposes, the grating 
grooves are projecting out of the page, along the optical axis, several meters away from the focal 
plane, which is shown in the plane of the page. The arc of diffraction lies along the circle that also 
contains the telescope focus and zero order. Small squares show a possible detector configuration. 


Therefore, in order to achieve high spectral resolving power, the grating grooves 
must exhibit or approximate this “radial” profile. Historically, grating grooves have 
been ruled parallel to one another, and if variable line spacing was necessary, it 
was typically performed orthogonal to the groove direction, given that these spec- 
trometers were operated in the in-plane mount. Off-plane gratings, however, require 
variable line spacing along the groove direction, which therefore requires new fabri- 
cation techniques. Varying the groove density along the groove direction is a viable 
implementation. The grooves can still be parallel to one another but with a density 
that increases in steps toward the focal plane. For a given wavelength, the incoming 
X-rays must merely encounter the appropriate groove density for their location 
of grating interaction along the optical axis. For example, say one X-ray photon 
disperses to a given position in the focal plane consistent with its wavelength and 
local groove density at its impact point on the grating. If a second X-ray photon of 
the same wavelength intersects the grating at a point closer to the focal plane, then 
the groove density must be greater in order to diffract that X-ray to the appropri- 
ate dispersion, given that the throw has been decreased, as evident from Eq. (31). 
Employing parallel groove sections that vary in density along the groove direction 
can achieve this dispersion matching, but leads to aberrations if the groove length 
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of these sections is too long. The number of required sections in the variable line 
spaced off-plane grating is determined by several factors including, but not limited 
to, resolving power requirements, the quality of the telescope focus, the magnitude 
of other aberration factors (misalignments, pointing stability, etc.), and fabrication 
limitations (see Sec. 2). 

A second challenge involves overlapping the spectra from each of the gratings 
in the array. This is achieved by fanning the gratings such that all surfaces project 
to the diameter of the circle defining the arc of diffraction. Furthermore, the groove 
hubs must be coincident. If improperly aligned, the spectral lines from each grating 
will be placed at slightly different positions on the focal plane, thus leading to an 
aberration of the line and a loss of spectral resolving power. This also decreases 
sensitivity for a given spectral feature, as its flux is spread over a larger area. 
Even if each grating adheres to this alignment prescription, there is still an inherent 
astigmatism to the assembly, given that each grating has an optimal focal plane that 
is perpendicular to the plane of the grating. If the gratings of the array are fanned, 
then the focal planes are also fanned and not perfectly coincident with one another. 
In this case, the focal plane can be optimized on a chord parallel to the horizontal 
diameter of the arc of diffraction. This chord represents an optimized intersection of 
the projected grating planes and allows optimization of spectral resolving power at 
the two wavelengths corresponding to the two intersections of this chord with the 
arc. These various considerations present an alignment challenge that is detailed 
further in Sec. 4. 

A third challenge evident from this figure is the triangular profile of the grooves. 
Mechanical or photolithographic ruling of the grooves can easily produce normal 
profiles that are rectangular or sinusoidal. However, diffraction from such regular 
profiles disperses light evenly onto each side of zero order, with a preference for lower 
order diffraction. Since spectral resolving power scales as \/AA, higher orders lead 
to higher resolving power. Therefore, spectrometers that place most of the diffracted 
light at higher orders can achieve high throughput concurrently with high resolving 
power. A triangular or blazed profile, such as that shown in Fig. 4, preferentially 
disperses light in a direction normal to the face of the groove facet when the grating 
is viewed in cross-section (see Sec. 3.1). This direction corresponds to dispersion at 
the “blaze wavelength”. The geometry is optimized further by placing the grating 
array such that the grooves are at a slight angle relative to the optical axis, resulting 
in an a for zero order at the focal plane that equals the 7 of the optimized wavelength 
(see Fig. 2). When a = 6 = 6 (known as the “Littrow” configuration, where @ is 
the blaze angle), the array is optimized for diffraction efficiency. The choice of 
blaze angle is therefore driven by science considerations to optimize throughput 
and resolving power for a given bandpass and science case, allowing that science 
case to be optimized through customization of the blaze angle (see Sec. 2). This 
fabrication challenge is further complicated by the need for large format gratings. 
Larger gratings lead to larger effective areas. Practical sizes for individual gratings 
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for space-based X-ray spectrometers can be as large as ~100 x 100 x 0.5mm. This 
necessitates hundreds of thousands of grooves with high fidelity profiles for each 
grating to ensure accurate diffraction. Furthermore, the large format gratings must 
be flat so that deformations in their optical figure do not lead to aberrations in 
spectral lines. These considerations are also discussed in Sec. 2. Finally, blazing 
off-plane gratings also minimizes the size of the readout. The effect of the blaze 
causes preferential diffraction to one side of zero order, so that the area covered by 
the detector is reduced by a factor of two. This assists with the ever-present mass, 
power, and cost limitations for space-based missions. 

Additional considerations for maximizing spectral resolving power are high 
groove density and long throw distances. As mentioned above, the amount of dis- 
persion for a grating determines the physical extent over which the spectrum is 
diffracted and hence resolvable. Since dispersion is inversely proportional to the 
throw and the groove density, longer throws and/or higher groove densities can be 
used to spread the spectrum over a greater distance and increase spectral resolving 
power. For example, consider a telescope with a scatter-dominated ~10 arcsecond 
telescope half-power diameter (HPD)? and a focal length of ~8m. The off-plane 
grating array can subaperture this telescope beam to create a bowtie-shaped line 
spread function with a full width at half maximum (FWHM) of ~1 arcsecond in the 
dispersion direction.® If the resolving power goal is >3000 for 1 keV X-rays in third 
order, this translates to a groove density requirement of >4000 grooves/mm. If the 
telescope is of poorer quality, the throw is reduced, or the resolution requirement 
increased, then increased groove densities are necessary to gain resolving power. 

The previous geometric considerations make off-plane grating arrays excellent 
candidates for X-ray spectroscopy missions, yet it is clear that many challenges exist 
in ensuring high spectral resolving power with large effective area. In addition to the 
challenges outlined above, the off-plane mount has some drawbacks in comparison to 
other spectrometer geometries, as it requires higher groove densities in comparison 
to in-plane gratings. Furthermore, it requires tighter alignment tolerances in com- 
parison to transmission grating spectrometers, which are also typically less massive. 
However, with the major challenges overcome — variable line spacing, alignment, 
profile blazing, high groove density, and large formats — an off-plane X-ray grating 
spectrometer can effectively deliver the performance and science requirements for 
X-ray spectroscopy missions. 


2. Off-Plane Grating Fabrication Techniques 


Past grating fabrication techniques have included mechanical ruling and 
photolithography. However, these are limited in their ability to produce the custom 
profiles that are required to address the necessary performance requirements for 


bHalf of the energy from the source is contained within a circle in the focal plane of this diameter. 
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cutting-edge X-ray spectrometers. The limits are both on the shape of the grooves 
and the area that can be ruled. Mechanical ruling cannot produce the groove density 
or custom groove profile at the 100s of nanometer scales required by these gratings 
and is not discussed here further. Photolithographic ruling, otherwise known as 
holographic recording, uses two laser wavefronts to produce an interference pattern 
onto a photoresist-coated substrate. The light sources are placed within the plane 
containing the grating normal and perpendicular to the eventual groove direction. 
Depending on the wavelength of the laser used, this method can create various 
groove densities typically ranging between 4500 and ~6000 grooves/mm. The result- 
ing groove profile is parallel and sinusoidal, while subsequent ion etching techniques 
can be used to put a blaze on the facets.” Furthermore, out-of-plane displacements of 
the light sources can produce pseudo-radial groove profiles.’ In theory, this method 
could overcome many of the challenges mentioned in the previous section; however, 
several limitations remain. First, higher groove densities require lasers with smaller 
wavelengths, which are not readily available. Second, reliable recording over large 
scales has not been accomplished and requires new tooling and larger laboratory 
space to implement the recording process. Also, the pseudo-radial profile is an 
approximation that results in a non-zero grating-induced aberration, as the converg- 
ing telescope beam does not match the groove convergence optimally. It is possible 
that these issues can be solved, but significant upgrades to production facilities are 
required with no assurance of success. 

The current state-of-the-art in off-plane grating fabrication utilizes techniques 
made possible by the semi-conductor industry. Micro and nano-fabrication meth- 
ods for silicon-based devices have been developed over decades and are capable of 
producing small scale, repeatable features. Studies of these methods have identified 
a procedure for producing high density, variable-line-spaced, blazed grating profiles 
over large formats.* A summary of the fabrication steps is given in Fig. 5. The 
first step is to obtain nitride-coated silicon wafers. The thin nitride layer (~10s of 
nm) is typically deposited using chemical vapor deposition. The silicon wafer can 
be purchased so that a given crystal plane lies nearly in the plane of the surface, 
thus defining the orientation of the remaining planes. For example, a wafer with 
(100) orientation has the [100] plane as the surface and the family of {111} planes 
is oriented 54.7° relative to it. Processes that exploit a given crystal plane can be 
used to sculpt blazed grating profiles. For instance, potassium hydroxide, KOH, 
preferentially etches through crystal bonds that establish planes other than the 
{111} planes. Therefore, KOH can be used to etch down to the {111} planes in 
a controlled fashion if this reaction is appropriately masked.? Masking is achieved 
through patterning of the nitride layer. First, a layer of resist is spin-coated onto the 
nitride coated silicon wafer (Step 1 in Fig. 5). The resist is then exposed to a beam 
of electrons using an electron beam lithography (e-beam) tool (Step 2). Such a tool 
can be capable of producing precise patterns over large areas. The electron exposure 
changes the molecular weight of the resist so that post-etch processing of the resist 
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Fig. 5. Methodology for nanofabrication of blazed off-plane reflection gratings. 


in a developer solution will dissolve the exposed resist. The e-beam tool can easily 
write densities in the many 1000’s of grooves/mm and can change this periodicity 
with sub-nm step sizes to achieve the variable line spacing in the groove direction. 
After exposure and development, the wafer is then placed in a reactive ion etcher to 
remove any residual resist, using an Og plasma. The exposed nitride is then etched 
with an O2/CHF3 plasma down to the silicon surface (Step 3). Any remaining resist 
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can be removed using a piranha etch (a mix of sulfuric acid and hydrogen peroxide) 
to leave strips of nitride that exhibit the periodicity of the desired grating (Step 4). 
Silicon oxidizes readily, so removing surface oxidation prior to KOH etching is nec- 
essary. This is performed using hydrofluoric acid (HF; Step 5). The silicon is now 
ready for the KOH etch, which works in between the nitride tabs down to the {111} 
planes (Step 6). The depth of the etch can be controlled by limiting the time of 
the KOH soak. However, for a fully blazed facet, the etch time is typically long 
enough to allow the {111} planes to meet at the bottom of the groove. Longer 
soaks allow for etching in the [111] direction, which steadily “undercuts” the nitride 
tabs and allows for control over the connection between the nitride strips and the 
silicon surface. Minimizing this link maximizes the size of the angled groove facet 
relative to the groove period. Once the optimal KOH etch time has been reached 
(this depends on factors such as etch temperature, KOH concentration, feature size, 
crystal orientation, etc.), the nitride tabs can be removed with an HF soak (Step 7). 

At this point the grating would be fully capable of diffracting X-rays efficiently. 
However, large (150mm diameter) Si wafers typically have significant warp (indus- 
try specifications are often <30m) while free-standing. This shape to the wafer 
can introduce an aberration into the line spread function if the telescope is of suf- 
ficient quality. Therefore, to obtain flat gratings, the pattern created on the silicon 
wafer must be transferred to a flat substrate. Fused silica wafers can be made with 
~1 pm of peak-to-valley flatness, which is adequate for most spectrometer applica- 
tions. The pattern can be transferred via nanoimprint lithography, which is basically 
nano-embossing (Step 8). In this case, the imprint is now a negative of the etched 
silicon pattern. This can be an advantage, as there is typically a tab at the top 
of the silicon groove pattern where the nitride was rooted prior to HF removal. 
This tab turns into the bottom of the groove (which is often shadowed for typical 
spectrometer geometries), while the bottom of the etched silicon pattern (which is 
typically the joining of {111} planes) becomes the top of the groove. Fused silica is 
transparent to UV so that UV-curable resists can be employed to carry the groove 
pattern. UV resists cross-link longer polymers, in comparison to thermal resists, and 
are mechanically stable. In the final step, the cured resist is coated with a metallic 
layer to increase X-ray reflectivity. Coatings such as 5nm Cr (to promote adhesion) 
with 15nm of Au are sufficient for high reflectivity (260%) of soft X-rays (<2 keV). 

Figure 6 shows scanning electron micrograph (SEM) images taken throughout 
the various steps of this process. The silicon wafer used in this study has a (311) 
orientation, which places the {111} planes at a 29.5° angle relative to the surface. 
Selecting wafer orientations in this way allows for customization of the groove profile. 
The sample was imaged after several of the processing steps to show the progression 
of patterned features. This particular grating was designed with a 160-nm period. 
These methods have been used to produce gratings with sizes up to 75 x 96mm, 
groove densities above 6000 grooves/mm, period step sizes of 0.25nm, and blaze 
angles ranging from ~10—55°. 
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Fig. 6. SEM images of a (311) silicon wafer during etching. Top left: A top-down image showing 
the pattern after e-beam exposure. The lighter areas are the tops of the patterned resist strips. 
Top right: A top-down image after reactive ion etching. The lighter areas are the tops of the 
remaining nitride strips. Middle left: A cross-section image after KOH etching. The 29.5° blaze 
angle resulting from the (311) orientation is evident, as are the remaining nitride tabs. Middle 
right: A cross-section image after HF soak. The nitride has been stripped, leaving small Si tabs 
at the tops of the grooves. Bottom center: a zoomed-out cross-section image showing large area 
consistency of the groove profile. These images were obtained by Dmitriy Voronov at the Lawrence 
Berkeley National Lab Molecular Foundry. 


3. Off-Plane Grating Performance 


Previous measurements of off-plane gratings® 71° have demonstrated their poten- 
tial for utilization in soft X-ray spectroscopy missions, yet these tests often utilized 
prototype gratings that were not developed fully in the absence of an identified 
mission or development opportunity. Therefore, these gratings typically lacked one 
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or more of the traits that are required for space-based X-ray spectrometers: high 
groove density, blazed profile, variable line spacing, and large format. The high 
groove density and variable line spacing are essential for high resolving power, 
given that the density leads to high dispersion, while variable line spacing limits 
grating-induced aberrations. The blazed profile is necessary to increase diffraction 
efficiency in the wavelength band of interest, while the large format also increases 
throughput per grating. Therefore, for application in space-based spectrometers, 
demonstrations of the following assets are necessary for off-plane gratings: (1) good 
diffraction efficiency from a blazed profile (Sec. 3.1), (2) good resolving power from 
a high-density, variable line spaced grating (Sec. 3.2), (3) methods for fabricat- 
ing these over large formats (Sec. 2), and (4) methods for incorporating them 
into an observatory (Sec. 4). Gratings fabricated with the methodologies described 
in Sec. 2 have been tested for diffraction efficiency and spectral resolving power 
and are discussed here as the current best representation of off-plane diffraction 
gratings. 

The performance goals for the gratings should be formulated by science require- 
ments but also constrained by design considerations. As discussed in Sec. 1.2, the 
design of a spectrograph is determined by many factors, but of utmost importance 
is the trade between throughput and resolving power. Ideally, a spectrometer would 
diffract all of the incident light in the wavelength band of interest and also achieve 
high spectral resolving power at those energies. Blazed grating profiles are capable 
of doing just that. As an example, a grating with a blaze angle of 54.7° (fabricated 
using the steps in Fig. 5 on a (100) orientation Si wafer) will preferentially diffract 
near the direction of the blaze, or at @ ~ 54.7° in the Littrow configuration. Once 
again assuming that the throw (ZL) is 8m, as an example, the blaze wavelength 
is located at a dispersion distance of ~592mm (=2L sin 7, tan 2) from zero order 
for a grazing incidence angle of y, = 1.5° (note, this is not the same as the cone 
half-angle, y, from Fig. 1). The dispersion relation (Eq. (31)) shows that this cor- 
responds to a blaze wavelength of ~123A in first order for a groove density of 
6000 grooves/mm. Astrophysically important spectral lines (highly ionized C, N, 
O, Ne, Mg, S, Si, Fe, etc.) exist over the entire soft X-ray range (~0.3-2keV or 
~6-40 A). Therefore, the highest efficiency for each spectral feature will be obtained 
when nA for that feature is near the blaze wavelength. For instance, the resonance 
line of the OVII He-like triplet occurs at 21.6A, which means that most OVII 
photons will be preferentially diffracted in sixth order due to the blaze. This is 
an advantage, given that resolving power scales with order (R = nA/AA) so that 
it is now six times larger than it would be for first order (typically the strongest 
order for unblazed profiles such as a sinusoid). In this way, key science goals can 
be realized. The blaze places diffracted light at high order where resolving power 
is also increased. Therefore, when quantifying the capability of an off-plane grating 
spectrometer, measurements of high diffraction efficiency and high spectral resolving 
power at relevant orders are required. 
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3.1. Diffraction Efficiency 


The amount of incident light diffracted into the spectrum of interest is a measure of 
the quality of a grating as well as its utility. Consistency of the groove profile will 
lead to diffraction efficiencies that are well described by theory. If variation is present 
(e.g. the facet blaze angle varies over the grating) then the resulting efficiency 
curves will be an average over these variations, which may decrease the amount 
of flux in the orders of interest while also leading to difficulties in modeling. As 
described in Sec. 1.1, diffraction theory assumes that the groove profile is repeated 
exactly in « (orthogonal to the groove direction) and is constant in z (along the 
grooves). Therefore, a measurement of diffraction efficiency that matches theoretical 
predictions is a verification of groove profile consistency. 

Figure 7!© displays the diffraction efficiency of a blazed off-plane grating fab- 
ricated using steps 1-7 from Fig. 5. These measurements were taken using the 
Advanced Light Source!” (ALS) at Lawrence Berkeley National Lab. The test grat- 
ing measured 10 x 30mm and has a variable-line-spaced profile with two periods, 
160 nm and 159.75nm, each over 15mm of groove length. The sample was coated 
with 5nm Cr and 30nm Au prior to testing. The Si wafer orientation is (100), lead- 
ing to the blaze angle of 54.7° evident in the SEM and atomic force microscopy 
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Fig. 7. Diffraction efficiency of a blazed off-plane grating tested at ALS.!® The measured data 
points are connected by solid lines while the continuous theoretical expectations are displayed as 
dotted lines. (See electronic edition for a color version of this figure.) 
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Fig. 8. Left: A top-down SEM image of the Au-coated test grating used to produce the results 
in Fig. 7 The grooves have a rounded-plateau top of ~33nm. Right: AFM image of the profile. 
Images courtesy of Dmitriy Voronov at LBNL. 


(AFM) images of Fig. 8. The measured efficiencies for negative orders 5-8 are 
shown as discrete points at 50eV steps connected by solid lines. Also included in 
this figure are theoretical curves (dotted lines) calculated using PCGrate software. 
These curves were generated using a 54.7° blazed profile at the designed density, 
a groove top consistent with the SEM imaging, and a roughness consistent with 
AFM measurements (see Fig. 8). The theoretical curves are a good match to the 
data, providing numerical verification of the groove consistency that is evident from 
inspection of the images. 

These results show very good response from the tested off-plane grating. The 
54.7° graze places the predicted amounts of flux into these high orders. Despite 
the ~35nm plateau on the tops of the grooves, which displaces flux across orders 
including zero order, the sum of absolute efficiency over the orders (including Au 
reflectivity) is very high at 235%. Therefore, utilizing off-plane gratings such as 
these can realize high diffraction efficiency at high order. 


3.2. Spectral Resolving Power 


Spectral resolving power for high density, VLS off-plane gratings has been mea- 
sured!® using the Stray Light Facility (SLF) at NASA’s Marshall Space Flight 
Center (MSFC). This is a 100-m X-ray beamline that utilizes an electron impact 
source on one end and a 12m long, 3m diameter test chamber on the opposite end. 
The test utilized slumped glass optics from NASA’s Goddard Space Flight Center!? 
to create the focused telescope beam. The test grating measures 25 x 32mm on a 
0.5mm thick Si wafer. The profile is rectangular with three varying groove periods 
(165.5 nm, 165.75nm, and 166 nm) made to match the convergence of the telescope.°® 
In this case, the grooves were created using a different process than that depicted 
in Fig. 5. Instead of writing the 0.25nm period step changes directly with a high 
precision e-beam tool, this grating was created using a coarser tool with 1 nm step 
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Fig. 9. Spectral resolving power results for off-plane gratings.'® This is the line spread function 
for sixth order Al Ka; and Kaz. The measured data are shown as the blue histogram and a best-fit 
Lorentzian is shown in red. 


size. The pattern was written at a 4x larger scale on a mask that was then used in 
deep UV projection lithography to deproject (reduce) the pattern onto a photoresist- 
coated Si wafer, thus achieving the 0.25nm step size. Following development, sub- 
sequent reactive ion etching transfers the pattern into the Si. Figure 9 shows the 
results from illumination of the central portion of the grating by the optics. Given 
that the groove profile is unblazed, a geometry is chosen with a = 0, allowing for 
resolving power measurements out to +6th order for the 1.49keV (8.34A) ALK 
fluorescence line. The line profile shows contributions from both the Ka; and Kaz 
lines (centroids of 1.48670keV and 1.48627 keV, respectively), with the latter hav- 
ing ~1/2 the amplitude of the former. The resolved line width corresponds to a 
spectral resolving power of R = 3900. This demonstrates that off-plane gratings 
exhibiting high density, variable-line-spaced profiles are capable of achieving high 
spectral resolving power at high order. 


4. Grating Alignment 


Having demonstrated high diffraction efficiency and high resolving power at high 
order, an off-plane grating is then ready for incorporation into a spectrograph. As 
described in Sec. 1.3, this involves overlapping the spectra from many gratings, 
which requires precise alignment between all grating elements. The alignment tol- 
erances are dependent on the spectrometer’s specific physical characteristics. The 
tolerances for the six degree of freedom placement of one grating relative to another 
can be calculated.?° This calculation tracks the position of a spectral line and 
equates deviations in a certain degree of freedom to displacement of the arc at 
the focal plane (yaw = rotation about y, pitch = rotation about z, roll = rotation 
about z). A tolerance can be constructed by requiring this displacement to be less 
than some value, such as the half-power diameter of the LSF in each of the two 
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dimensions defining the focal plane. Once again using an 8.4m telescope with ~8m 
of grating throw, an example of the results of a tolerance calculation is shown in 
Table 1. This study assumes that the entire error budget when creating the LSF is 
due to grating misalignments, when in reality many other factors can contribute, 
such as telescope pointing errors, focal plane distortions, substrate figure, groove 
aberrations, etc. This is a slightly different exercise when considering contributions 
from all gratings within an aligned module of gratings, as well as alignment toler- 
ances between modules if multiple modules are required, which is often the case. 
Also included in Table 1, for comparison, is a list of alignment tolerances for the 
Off-plane Grating Rocket Experiment (OGRE), a suborbital X-ray spectrometer.?! 
The OGRE calculation differs from the 8.4m case because the telescope has a 
3.5-m focal length, so that the throw is much different while the groove density is 
the same. The tolerances in this case are very similar though, since the gratings 
are most sensitive to angular deviations in the rotational degrees of freedom, which 
contribute equally to the LSF for any throw. In either case, the deviation of each 
grating from perfect alignment causes a shift in the dispersion direction and the 
cross-dispersion direction. Shifts in the dispersion direction limit spectral resolving 
power, while shifts in the cross-dispersion direction cause a spread of the line and 
possible loss of signal. Therefore, Table 1 also assesses which dimension worsens 
more quickly with misalignments per degree of freedom. For example, in the case of 
the 8m system, deviations of +21.6 arcseconds in roll between gratings will cause 
misalignments of the respective LSFs by one FWHM of the LSF, thus limiting 
spectral resolving power; however, deviations of +7.9 arcseconds in yaw cause the 


respective lines to separate predominantly in the cross-dispersion, thus leaving the 
resolving power relatively constant, but spreading the line over more detector area. 

Once the tolerances are known, an alignment methodology needs to be con- 
structed to achieve and measure each degree of freedom. This typically employs the 
use of several metrology tools and precision stages. An example alignment setup 
is shown on an optical bench in Fig. 10.2% In this particular setup the grating is 
held by a hexapod for precise positioning. A wave-front sensor and its associated 
optics are used to measure the grating pitch and roll. An optical grating can be 
placed near the X-ray grating with the grooves in the same direction. In this way, 


Table 1. Alignment tolerances for a representative 8.4-m focal length spectrometer 
and for the OGRE spectrometer. The range of each degree of freedom signifies an 
error of one FWHM. 


8.4m Tolerance?? OGRE Tolerance Limiting Effect 
Yaw +7.9 arcseconds +9.7 arcseconds Effective area 
Pitch +4.3 arcseconds +4.9 arcseconds Effective area 
Roll +21.6 arcseconds +18.7 arcseconds Spectral resolving power 
z +317 pm +175 um Effective area 
gy +170 wm +200 wm Effective area 
z +1.51mm +450 wm Spectral resolving power 
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Fig. 10. Picture of an example off-plane grating alignment and metrology setup. 


an optical laser can diffract onto a camera to measure relative yaw between gratings. 
The translational tolerances are met through machine tolerances on the associated 
jigs and mounts holding the gratings. Each grating is placed within the module and 
each rotational degree of freedom is recorded. The module pitch is incremented by 
the correct amount to match the optical fan of the gratings (see Sec. 1.3) and the 
next grating is placed and measured. In this way, many off-plane gratings can be 
aligned to meet the alignment tolerances, and hence the spectral resolving power 
and effective area requirements of an X-ray spectrograph design. 


5. Summary 


Off-plane reflection gratings offer high quality performance in the soft X-ray band. 
The off-plane geometry has been studied for some time, but only recently have 
fabrication techniques been able to realize profiles capable of reaching the perfor- 
mance requirements of next-generation X-ray observatories currently under study. 
Spectrometers utilizing these gratings are capable of achieving high effective area as 
demonstrated by empirical diffraction efficiency measurements. Furthermore, empir- 
ical results have demonstrated high spectral resolving power at the high orders that 
are necessary. Finally, aligned modules of off-plane gratings can be constructed to 
populate large area spectrometers. 

Given these recent developments, off-plane grating spectrographs have been 
considered or baselined for many recent mission studies. The International X-ray 
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Observatory (not selected for flight) included an Off-Plane X-ray Grating Spectrom- 
eter’? and subsequent studies such as NASA’s Notional X-ray Grating Spectrome- 
ter*+ and the Warm-Hot Intergalactic Medium Explorer baselined off-plane grating 
spectrometers. Most recently, another Explorer-class mission, Arcus,?° has baselined 
off-plane gratings, while the aforementioned OGRE sounding rocket mission is set to 
fly an off-plane module. The OGRE mission will be the first space-flight test of the 
latest generation of off-plane gratings fabricated with the methodologies described 
here. 
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We review the basic principles of X-ray polarimetry and current detector technolo- 
gies based on the photoelectric effect, Bragg reflection, and Compton scattering. 
Recent technological advances in high-spatial-resolution gas-filled X-ray detectors 
have enabled efficient polarimeters exploiting the photoelectric effect that hold 
great scientific promise for X-ray polarimetry in the 2-10 keV band. Advances 
in the fabrication of multilayer optics have made feasible the construction of 
broadband soft X-ray polarimeters based on Bragg reflection. Developments in 
scintillator and solid-state hard X-ray detectors facilitate construction of both 
modular, large area Compton scattering polarimeters and compact devices suit- 
able for use with focusing X-ray telescopes. 


1. Polarization 


The polarization of photons reflects their fundamental nature as electromagnetic 
waves. A photon is a discrete packet of electric and magnetic fields oriented trans- 
verse to the direction of motion. The fields evolve in time and position according 
to Maxwell’s equations. The polarization describes the configuration of the fields. 
Since the electric and magnetic fields are interrelated by Maxwell’s equations, the 
configuration of both fields is set by specification of the electric field alone. 

An electromagnetic plane wave propagating along the z-axis with angular fre- 
quency w can be described as a sinusoidally varying electric field of the form 


E = &Ex + jEy = @Eox cos(kz — wt) + §Eoy cos(kz — wt + €), (1) 


where € and the ratio of Eyx versus Eoy set the polarization and the wavenumber 
k = w/c. Polarization is symmetric under a 180° rotation, since such a rotation can 
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be produced by translation in time or space. The wave is linearly polarized if Ex 
and Ey are always proportional. This occurs when = nz, where n is an integer, 
so that Ex and Ey are exactly in phase or antiphase. The polarization angle is set 
by the ratio of Egy and Eox. If € 4 nz, the electric field rotates as a function of 
time or position, which is elliptical polarization. The polarization is described as 
right or left handed according to whether E rotates clockwise or counterclockwise. 
Circular polarization is the special case of elliptical polarization with Eox = Eoy. 

Each individual photon is necessarily polarized. However, different photons from 
a particular source may have different polarizations. If the distribution of polariza- 
tion angles is uniform, then the source has zero net polarization. If not, then the 
source has a net polarization. Generation of non-zero net polarization requires a net 
deviation from spherical symmetry in either the physical geometry or the magnetic 
field configuration of the astrophysical system. 

The Stokes parameters provide a means to fully characterize the polarization 
of a source using four intensities: 


I = (Ex) + (Eby), 
Q = (Epx) — (Eby), 
U = (2Eox Eoy cos &), 


VS (2Eox Eoy sin &). 


The averages are taken over the photons detected from the source. The frac- 
tional degree of polarization, also called the polarization fraction or the magni- 
tude of polarization, is P = \/Q? + U? + V?/I. The polarization position angle is 
tan(2¢9) = U/Q. The Stokes parameter V describes elliptical polarization. Since 
most X-ray polarimeters are sensitive only to linear polarization, we will not consider 
elliptical polarization further. 


2. Polarization Measurement 


Available X-ray instrumentation is able to measure the intensity of X-rays (the 
number of photons per unit time), the energies of X-rays (via conversion of that 
energy to charge or heat), and the positions of X-rays or, more precisely, the posi- 
tions at which an X-ray deposits charge via interactions. Since X-ray polarization 
cannot be measured directly, the X-rays must first undergo some interaction that 
converts the polarization information to a directly measurable quantity, typically 
intensity or position.+ 

In Fig. 1, we consider one rotating linear polarization analyzer. As the ana- 
lyzer is rotated, the associated detector records the intensity of photons (counts) 
at each analyzer angle. The resulting histogram of counts versus rotation angle, 
or modulation curve, is shown in Fig. 2. In general, the modulation curve will 
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Fig. 1. Polarization analyzer. A linear polarization analyzer is rotated and the associated detector 
records the intensity of photons (counts) at each angle as a modulation curve as shown in Fig. 2 
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Fig. 2. Modulation curve: detector count rate versus rotation angular of a linear polarizing filter. 
The crosses indicate data points. The dashed curves are sums of the Stokes decomposition as 
indicated. The solid curve is the sum of the three Stokes components. The curve has a = 0.9 and 
go = 30°. See electronic edition for a color version of this figure. 


have the form 
S($) = A+ Bcos?(¢ — ¢o). (6) 


The polarization position angle ¢o is the angle at which the maximum intensity is 
recorded, A describes the unpolarized component of the intensity, and B describes 
the polarized intensity. The modulation amplitude is a = (Siax — Smin)/(Smax + 
Smin) = B/(2A+ B). Given a modulation curve, a and ¢o can be obtained by 
nonlinear regression. 

The modulation curve can also be written in terms of the Stokes parameters as 


S(¢) = I+ Qcos(2¢) + Usin(2¢). (7) 


The Stokes decomposition is equivalent to a Fourier series with one period. The 
Stokes parameters can be obtained directly from the modulation curve: I = (S(¢)), 
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Q = S(0°)—I, and U = $(45°) —I. This can be visually verified in Fig. 2 as Q+J = 
S(0°), where the U sinusoid is zero, and U + I = S(45°), where the Q sinusoid is 
zero. This determination of the Stokes parameters is equivalent to measuring the 
source intensity through three filters*: unpolarized, polarized at 0°, polarized at 
45°. A key feature of the Stokes decomposition is that the modulation curve is 
linear in the Stokes parameters, thus they can be obtained via linear regression.” 
The magnitude, P, and angle, ¢o, of polarization can be recovered from the Stokes 
parameters using the equations in the previous section or fit for directly from the 
modulation curve. The Stokes parameters are particularly useful because they are 
additive if fitting multiple modulation curves. This is not true of the magnitude and 
angle of polarization. 

The discussion above assumes the use of an ideal polarization analyzer that 
passes no radiation if oriented perpendicular to a 100% polarized beam. In this case, 
the modulation amplitude is equal to the polarization fraction. The real world is less 
than perfect. Most actual polarization analyzers pass some fraction of the radiation 
even when oriented perpendicular to a 100% polarized beam. The “modulation 
factor”, j4, is defined as the modulation amplitude measured by a polarimeter for 
a 100% polarized beam. The modulation factor is a property of the polarimeter 
and may also depend on the energy or spatial distribution of the input photons. 
Background events (events not produced by photons from the source) also dilute the 
modulation curve. For a polarimeter with a measured y and background count rate 
b independent of rotation angle, the polarization fraction of a source that produces 
a modulation amplitude a and an average count rate r is 


paartb) (8) 
wor 


In designing an X-ray polarimeter, it is essential that the system (ana- 
lyzer/detector and telescope) can reach sufficient statistical accuracy for the mea- 
surements required. The traditional figure of merit is the “Minimum Detectable 
Polarization” (MDP).? The polarization fraction, P, is a non-negative quantity. 
Thus, due to statistical fluctuations, any particular measurement of P will produce 
a value greater than 0. The MDP is the largest fluctuation expected to occur with 
a probability of 1%. Equivalently, the MDP is the smallest polarization that can be 
detected at a 99% confidence level. The MDP for an observation of duration T is 


mpp — 429,/r+_ 429 1 ff, 6 


= ay 14, 
ur VOT uw JN r 


(9) 


where N = rT is the total number of source counts. Reaching an MDP of 1% with 
an ideal polarimeter, 4 = 1 and b = 0, requires ~200,000 counts. 


“Polarization measurements are frequently done with such sets of filters in the optical/IR. 
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Usually, a scientifically useful polarization measurement entails determination 
of both P and ¢o. This is a joint measurement of two parameters and requires 
additional statistics beyond those suggested by the MDP.° For small polarization 
amplitudes and b/r < 1, an increase in counts by a factor ~2.2 is needed to maintain 
a 99% joint confidence interval for two parameters.”*° The factor decreases as the 
polarization amplitude increases. 

The X-ray polarization levels predicted for astronomical objects are often quite 
low, near 1%, thus instrumental or systematic errors are a serious concern. Accurate 
calibration, including with unpolarized beams, is essential for successful polarization 
measurements.® Also, rotation of the instrument is a powerful tool to understand 
and remove the effects of systematic errors. The fact that polarization is symmetric 
under a 180° rotation can also be used to check for systematic errors, even for 
polarimeters that require rotation to perform the measurement. Since most astro- 
nomical X-ray sources are time varying, the rotation period should either be shorter 
than the typical timescale of variability, or many rotations should be executed during 
each individual observation. 


3. Physical Processes for Polarization Measurement 


The mass attenuation coefficients for interaction of photons with neon are shown as 
a function of energy in Fig. 3. Photoelectric interactions dominate at low energies, 
Compton scattering dominates at intermediate energies, and pair production domi- 
nates at the highest energies. The mass attenuation coefficients are similar for other 
elements, but the transitions shift to higher energies for higher atomic number. The 
mass attenuation coefficient determines which interaction is most effective for polar- 
ization analysis in each band: photoelectric below a few tens of keV and Compton in 
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Fig. 3. Photon interactions: mass attenuation coefficients for photoelectric (solid line), coherent 
scattering (dotted line), Compton scattering (dashed line), and pair production (dash-dot line) 
interactions of photons in neon versus energy.* 
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the hard X-ray/soft gamma-ray band. Bragg reflection (coherent scattering from a 
crystal or multilayer) has been used for X-ray polarimetry in the “standard” X-ray 
band from 2-10 keV and demonstrates promise in the soft X-ray band. 

The design of an X-ray polarimeter depends strongly on the physical inter- 
action used to obtain polarization sensitivity. In the following sections, we review 
current work on X-ray polarimeters exploiting different physical processes used for 
polarization analysis. 


4. Photoelectric X-ray Polarimeters 


4.1. Photoelectric Interaction 


In a photoelectric interaction between an X-ray and an atom, an electron (the 
“photoelectron” ) is ejected from an inner shell of an atom with a kinetic energy 
equal to the difference between the photon energy and the binding energy. The 
photoelectron direction is determined by the electric field of the photon. For a 
linearly polarized photon, the photoelectron angular distribution is given by 
do _ sin?(0) cos?() 7 
dQ (1 — Bcos(6))4’ am) 
where @ is the photoelectron azimuthal angle relative to the photon electric field 
vector, @ is the photoelectron emission angle relative to the photon momentum 
vector, and (3 is the photoelectron speed as a fraction of the speed of light (see Fig. 4). 
For low energy photons (up to tens of keV), leading to low energy electrons and 
GB <1, the photoelectron is emitted preferentially in the plane perpendicular to the 
photon momentum vector, 9 = 90°. For more energetic photons and photoelectrons, 
the distribution shifts toward the forward direction. 
The photoelectron is preferentially emitted parallel to the photon electric field, 
i.e. the distribution peaks at @ = 0°. Thus, it is possible to determine the linear 
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Fig. 4. Angular distribution of the photoelectron emitted by interaction of a linearly polarized 
photon with an atom. The photoelectron is emitted preferentially along the photon electric field, 
but not necessarily exactly parallel to the electric field. The direction of emission is described by 
two angles: ¢ is the azimuthal angle relative to the photon electric field vector, 0 is the emission 
angle relative to the photon momentum vector. 
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polarization of the incident photon by measuring the initial direction of the photo- 
electron. The photoelectric effect is an ideal polarization analyzer — the probability 
of ejecting a photoelectron perpendicular to the electric field vector is zero. 


4.2. Photoelectron Track 


Once the photoelectron is emitted, it interacts with the surrounding matter. The 
photoelectron ionizes atoms, producing electron-ion pairs and changing its own 
direction and losing energy. It also scatters off atomic nuclei, changing its direction 
but with no significant energy loss. The photoelectron leaves a trail of electron-ion 
pairs marking its path from initial ejection to final stopping point. This trail is 
referred to as the photoelectron “track” .° 

A photoelectron track is shown in Fig. 5. Since the photoelectron is emitted 
preferentially in the plane perpendicular to the photon momentum vector, it is usu- 
ally sufficient to reconstruct the photoelectron track only in that plane. To extract 
the initial direction of the photoelectron one must: (1) determine which is the start- 
ing end of the track, (2) measure the angle of the track near its start. The energy 
loss rate (per distance traveled) of the photoelectron is inversely proportional to its 
instantaneous energy.? Thus, the energy loss is lowest near the initial part of the 
track and highest at the end. The concentrated energy loss near the end of the track 
is the “Bragg peak”. This asymmetry in energy loss provides a means to identify 
the start versus end of the track. Once the start of the track is identified, one must 
then fit some portion of the track profile to reconstruct the initial photoelectron 
direction. Because the photoelectron scatters as it moves through the gas, the track 
is not straight. Minimizing the track length used for the initial direction fitting 


End point 


Interaction 
point 


Fig. 5. Photoelectron track. The image on the right shows a relatively straight track with the 
interaction point, endpoint, and initial photoelectron direction marked. The image on the left 
shows a track where the photoelectron has suffered substantial scattering, from Ref. 7. 
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minimizes the effect of scattering. However, a sufficient track length must be used 
to obtain an accurate measurement of the initial direction, since the track has a 
non-zero width due to electron diffusion and detector resolution and also since the 
statistical accuracy improves with the number of secondary electrons used. Thus, 
the track reconstruction algorithm must balance these factors. !° 

Another complication in track fitting arises from “Auger electrons”. The ejec- 
tion of a photoelectron leaves the atom with an unfilled orbital, often in a core 
shell. The orbital is refilled by an outer shell electron accompanied with emission 
of a photon or an electron, necessary for energy conservation. Emission of a fluo- 
rescence photon usually does not affect the photoelectron track, since the photon 
absorption length is long compared to the track length. However, emission of an elec- 
tron complicates the photoelectron track, leading to a reduction in the modulation 
factor. Electrons are emitted via the Auger process in which one outer shell elec- 
tron fills the core orbital, while a second outer shell electron is emitted, leaving the 
atom doubly ionized. The Auger electron energy is equal to the difference between 
the binding energy of the core orbital and the sum of energies of the two outer 
orbitals. The probability for Auger emission is high for elements with low atomic 
number. However, use of low atomic number elements also lowers the Auger electron 
energy. 

Photoelectric polarimetry can be performed in any detection medium. However, 
good modulation factors have been achieved for photoelectric polarimeters only 
using gas detectors. The reason is the electron track length. In silicon, the range of 
a 1keV electron is 0.03 wm, while that of a 10 keV electron is 1 wm.'! Resolving the 
photoelectron track requires pixels that are a small fraction of electron track length, 
while solid-state X-ray detectors to date have minimum pixel sizes on the order of 
10 um. The modulation factors reported for solid-state photoelectric polarimeters 
are all below 10%.'8 Increasing the modulation factor would require a decrease in 
pixel size. In contrast, the electron range in neon at latm and 0°C is 0.08mm at 
1keV and 3.0mm at 10keV, see Fig. 6.14 While position resolution on the order of 
100 jxm is feasible in gas detectors, it is quite challenging. 

There are two keys issues in photoelectric X-ray polarimetry with gas detectors: 
the ratio of photon absorption length to electron track length and the diffusion of 
the charge carriers in the gas. Figure 7 shows a conceptual view of a gas-filled pho- 
toelectric X-ray polarimeter. X-rays enter at the top of the figure. To be detected, 
an X-ray must interact at some point within the gas volume and produce a photo- 
electron. The gas volume must be sufficiently deep so that a significant fraction of 
the X-rays undergo photoelectric interactions. If the gas layer is too thin, then the 
detector will have poor quantum efficiency. The required depth is set by the X-ray 
attenuation length — the distance at which 1/e of the original X-rays remain. In 
neon at STP, the attenuation length is 1.4mm at 1keV and 972mm at 10keV. 
These lengths are much longer than the corresponding electron track lengths, see 
Fig. 6. 
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Fig. 6. Electron range (dashed line) and X-ray absorption length (solid line) in neon at 1atm 
and 0°C. 
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Fig. 7. Photo-electric polarimeter with the “Costa” or “parallel-drift” geometry, from Ref. 8. See 
electronic edition for a color version of this figure. 


The primary photoelectron produces a track of electron-ion pairs. The electrons 
in the track must be brought to readout electrodes at the edge of the detector. The 
electrons can be drifted through the gas by application of a uniform electric field. 
The drift field can be applied either parallel to the direction of the incident photon, 
the “Costa geometry” (Fig. 7), or perpendicular, the “Black geometry” (Fig. 8). 
As the secondary electrons drift, they scatter on the gas atoms. Thus, localized 
concentrations of secondary electrons diffuse as they drift. Diffusion degrades the 
track image, reducing the accuracy with which the initial track direction can be 
measured, and reducing the modulation factor. 
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Fig. 8. Photo-electric polarimeter with the “Black” or “perpendicular-draft” geometry, from 
Ref. 7. 


4.3. Costa Geometry Photoelectric Polarimeters 


In the Costa geometry (Fig. 7), the drift field is applied along the direction of the 
incident photon.® The photoelectron track is drifted onto a gas electron multiplier 
(GEM) where it is amplified and then imaged with a two-dimensional array of sen- 
sors. The realization of instruments using the Costa geometry was made possible by 
the development of the Gas Pixel Detector (GPD) by Bellazzini employing custom 
CMOS readout electronics fabricated in deep sub-micron VLSI technology.'?: !3 The 
latest devices have ~100,000 pixels with 50 wm pitch covering a 15mm? area.!° The 
modulation factor for a detector using this readout device with a 1 cm deep absorp- 
tion region with latm of 20% He/80% dimethyl ether (DME, chemical formula 
CH30CH3) has been measured to be 21% at 2.6keV, rising to 47% at 5.2keV.1° 
We note that these ~ are quoted with no rejection of events. Removal of events 
that are close to circularly symmetric increases the modulation factor at a cost in 
efficiency. Allowing the efficiency to decrease to 78% increases the pz to 28% and 
54%, respectively. 

A key advantage of Costa geometry detectors is that they are symmetric 
under rotation (through multiples of 60° for hexagonal pixels) around the incident 
photon direction. Measurements using unpolarized X-rays show very low residual 
modulation, 0.18% + 0.14%.'8 It has been suggested that they can produce accu- 
rate polarization measurements without use of rotation. Another advantage of the 
Costa geometry is that it provides for true two-dimensional imaging, in addition 
to polarimetry. Imaging can be used to lower the instrumental X-ray background 
for point-like sources and to provide spatially-resolved polarimetry for extended 
sources. 

A disadvantage of the Costa geometry is that the maximum electron drift dis- 
tance is the same as the maximum X-ray absorption depth. Since both diffusion 
and quantum efficiency increase with drift/absorption distance, the Costa geometry 
requires a trade-off between minimizing diffusion, thus increasing modulation factor, 
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and maximizing quantum efficiency. The product of quantum efficiency multiplied 
by modulation factor tends to peak in a relatively narrowband for any specific 
polarimeter design. 

Missions based on Costa geometry polarimeters have been proposed several 
times. Recent examples include XIPE (X-ray Imaging Polarimetry Explorer),!” 
which was proposed to the European Space Agency, and IXPE (Imaging X-ray 
Polarimetry Explorer),4’ which was accepted by NASA for launch in 2021. 


4.4. Black Geometry Photoelectric Polarimeters 


In the Black geometry, the drift field is applied perpendicular to the incident photon 
direction.’ The photoelectron track is drifted onto a gas electron multiplier where 
it is amplified and then imaged with a one-dimensional array of sensors. The second 
dimension of imaging information is obtained from the time development of the 
signal on each sensor, thus the detector is a “time projection chamber” (TPC). 
This necessitates the use of gases with relatively slow electron drift speeds. DME 
has the slowest known electron drift speed. 

Since the electrons drift perpendicular to the incident photons, the absorp- 
tion depth is decoupled from the electron drift and large absorption depths can be 
used. The absorption depth for the detectors built for the Gravity and Extreme 
Magnetism Small explorer (GEMS) mission is 31.2cm.'® The GEMS detectors were 
filled with 190 Torr of DME. The readout strips had a pitch of 121 um and 120 active 
strips were sampled at a rate of 20 MHz. The electric field in the gas volume was 
adjusted to produce a pixel size of 121 zm on the time axis. The modulation factor 
in these detectors was measured to be 29% at 2.7keV, rising to 43% at 4.5keV.1® 

Use of the Black geometry comes with two costs. First, the Black geometry uses 
different techniques to image the two dimensions of the photoelectron track, time 
versus space. As noted above, systematic measurement errors are a serious concern 
in polarimetry and the Black geometry has an intrinsic asymmetry between the two 
dimensions. This requires either careful design and operation? of the polarimeter to 
minimize the asymmetry, rotation of the polarimetry to zero out any net asymmetry, 
or both. Measurements using unpolarized X-rays on the GEMS polarimeters showed 
a residual modulation of 0.21% + 0.28%.'® 

Second, while Costa geometry detectors can image the sky in two dimensions, 
only one-dimensional imaging of the sky is possible in the Black geometry. The 
track image along the time coordinate provides only relative positions of electrons 
in the track because the overall drift time is unknown.° The imaging quality of the 
Black geometry is further degraded if a deep absorption volume is used since the 
X-rays will be in focus only at one depth and out of focus at all other depths. 


bSpecifically, careful monitoring and control of the electron drift speed. 
“If additional instrumentation were added to precisely record the X-ray arrival time, via detection 


of scintillation photons produced in the initial interaction, then two-dimensional imaging of the 
sky would be possible. However, no feasible implementation has been demonstrated. 
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While the discussion of photoelectric polarimeters to this point has assumed 
drift of free electrons, in some gases charge transport occurs via negatively charged 
ions. Negative ions offer reduced diffuse and drift speeds compared to electrons. !% 7° 
This allows larger drift regions and slower electronics (when used in the Black or 
TPC geometry).?! An X-ray polarimeter has been operated using low concentrations 
of nitromethane (CH3NOz2) as the electron capture agent with CO2 providing the 
balance of the gas.?? The readout used 120 ym strips sampled at an effective rate 
of 167kHz to produce square pixels with a measured drift velocity of 20m/s. The 
modulation factor was measured to be 38% between 3.5 and 6.4keV.7% 


5. Compton/Thomson Scattering Polarimeters 


5.1. Scattering 


At energies above a few tens of keV, Compton scattering is the dominant interaction 
of X-rays with matter. When the X-ray energy is an appreciable fraction of the rest 
mass energy of an electron, the electron will recoil during the interaction, taking 
energy from the photon. The cross-section is 


do r2(E'\?(E' E sche 
=-=($) (G+ 2sin @ cos 6). (11) 
where r. is the classical electron radius, F is the initial photon energy, E’ is final 
photon energy, and we have averaged over the polarization of the final photon.*4 
The photon energies are related to the scattering angle, 0, as 


B= 
E’=E|1+(1—cos0 12 
1+ (1 cose) | (12) 

For scattering angles near 90°, the azimuthal distribution of the scattered pho- 
ton is strongly dependent on the X-ray polarization, thus Compton scattering is 
effective for polarization analysis. At low X-ray energies, the electron recoil becomes 
negligible. In this limit, known as Thomson scattering, modulation reaches 100% 
for 90° scattering. 


5.2. Measurement Technique 


The basic principle of all Compton/Thomson polarimeters is shown in Fig. 9. An 
X-ray scatters on a target. The scattered X-ray is then detected. At X-ray low 
energies, in the Thomson limit, only the scattered photon is detected. The tar- 
get/detector geometry is typically arranged to maximize scatterings through polar 
angles of 90° and the detector records the azimuthal distribution of scattered pho- 
tons. The target is usually chosen to be a low atomic number material to maximize 
the ratio of the Thomson versus photoelectron cross section. 

If the X-ray is sufficiently energetic, in the Compton regime, it produces a 
recoil electron. Thus is it possible to detect both the initial interaction point and 
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Fig. 9. Compton/Thomsom polarimeter. 


the scattered photon. Compton polarimeters do not require a distinction between 
target and detector. Hence, Compton polarimetry is possible in uniform detector 
arrays. However, the polarization sensitivity can be improved with the use of low 
Z targets, since such targets can increase the path length traveled by the scattered 
photons and also increase the fraction of photons that are Compton scattered rather 
than photoelectrically absorbed. In such polarimeters, the target is referred to as 
an “active target” if recoil electron can be detected and the detector recording the 
scattered photon is sometimes called a “calorimeter” (since it absorbs the majority 
of the photon energy). 


5.3. Instruments 


The first dedicated extra-solar X-ray polarimeter was a Thomson scattering 
polarimeter flown on a sounding rocket.” A similar instrument flown later together 
with a Bragg reflection polarimeter (discussed further below) provided the first 
successful measurement of the X-ray polarization of an extra-solar object.7° 
Recently, the Gamma-Ray Burst Polarimeter (GAP) flew aboard the Japanese 
IKAROS mission. GAP was designed to measure the polarization of gamma-ray 
bursts in the 50-300keV band. It consists of a single plastic scintillator target 
(a low Z material) with a diameter of 140mm surrounded by a cylinder of 12 CsI 
scintillators.2” The modulation factor was measured to be 52% using an 80keV 
pencil beam with 0.8mm diameter illuminating the center of the target. Monte 
Carlo simulations suggest that the modulation factor for astrophysical sources that 
illuminate the whole target is lower, near 30% on axis and decreasing off axis. 
Uniform response in the CsI scintillators is essential to accurate polarimetry; in- 
flight calibrations established uniformity at the 2% level. GAP detected polarization 
from three gamma-ray bursts, reporting high average polarizations, 27+11% to 
84t5e%, at significances ranging from 2.90 to 3.70 and the detection of variable 
position angle (at 3.50 confidence) in one GRB.?° The systematic uncertainty is 
dominated by the off-axis response and was estimated to be near 2% (1 — @).?9 
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There are currently several Compton/Thomson polarimeters in various stages 
of development.°° Several of them use low Z active targets surrounded by high 
Z calorimeters, specifically the Gamma-RAy Polarimetry Experiment (GRAPE) 
and the Polarimetry of High ENErgy X-rays (PHENEX) experiment. PHENEX is 
a collimated instrument designed to observe known astrophysical sources. GRAPE’s 
primary science goal is GRBs, the but initial balloon flights will use a collimator 
and point at bright X-ray sources. 

A key issue in these Compton polarimeters is the relatively high background 
counting rate, which limits the polarization sensitivity. The lightweight Polarized 
Gamma-ray Observer (PoGOLite) uses plastic scintillators in the detector and has 
a large active shield to reduce background. An even greater reduction in background 
can be achieved using hard X-ray focusing optics, as demonstrated by the recent 
success of the NuSTAR mission. Focusing optics allow use of targets and detectors 
with greatly reduced volume and a corresponding reduction in background, which 
can be reduced further via an active target.?4 X-Calibur uses a low Z target sur- 
rounded by a CZT detector assembly placed at the focus of a grazing-incidence hard 
X-ray telescope to do polarimetry in the 15-80 keV band.*? The detector and shield 
will rotate around the telescope axis at 1Orpm to minimize systematic effects. 

The wide fields of view needed to catch GRBs preclude use of focusing optics, 
so hard X-ray GRB polarimeters will necessarily use large detector arrays. Progress 
will likely require a dedicated, although potentially small, mission to achieve the 
total detector volume and mission duration needed to perform polarimetry on a 
significant sample of GRBs. 


5.4. Measurements with Non-polarimeters 


Recently, there have been several polarization measurements using the Compton 
technique with instruments not designed for polarimetry. The International Gamma- 
Ray Astrophysics Laboratory (INTEGRAL) observatory carries the Spectrometer 
on INTEGRAL (SPI) instrument, which was designed to provide high resolution 
spectroscopy in the 18 keV to 8 MeV band. SPI consists of 19 hexagonal Germanium 
solid-state detectors, surrounded by an anti-coincidence shield, that view the sky 
through a coded-aperture mask. To do polarimetry, one selects events in which a 
gamma-ray deposits energy in two detectors (within a 350ns coincidence window) 
and then searches for an azimuthal asymmetry in those detector pairs. However, 
other factors, such as the coded aperture shadow pattern and dead detectors within 
SPI, also affect the pattern of detector pair hits and can produce spurious polariza- 
tion signatures. A Monte Carlo simulation of the instrument can be used to model 
all of these effects. Simulations performed with various polarization amplitudes and 
position angles (varied in addition to the non-polarimetric source parameters such as 
position on the sky and spectral shape) can then be compared with the observational 
data obtained on a source and used to estimate the source polarization.*? Analysis 
of 5 x 10° double events from the Crab nebula was analyzed via this technique using 
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7 x 10° simulated events. The result was a significant detection of polarization in 
the 0.1-1 MeV band at a level of 46410% at a position angle of 123° + 11°.%4 

The measurement has been confirmed using the Imager on Board the 
INTEGRAL Satellite (IBIS) instrument. IBIS has two planes of detectors. Events 
that trigger one detector in each plane are identified as “Compton events”, but 
only 2% arise from a true Compton scattering. IBIS measured a polarization in the 
200-800 keV band with a position angle consistent with SPI, but a somewhat higher 
amplitude.®° 

A number of other measurements have been reported using SPI, IBIS, and the 
Ramaty High-Energy Solar Spectroscopic Imager (RHESSI), primarily of gamma- 
ray bursts.°° However, these are of lower significance, for both statistical and instru- 
mental reasons. Several instruments likely to fly in the next several years, notably 
the soft gamma detector (SGD) on the Japanese Astro-H mission, will be able to 
exploit the polarization sensitivity of Compton scattering. However, instrument not 
specifically design and operated for polarimetry tend to suffer from instrumental 
effects that limit their ultimate sensitivity, typically to minimum detectable polar- 
izations (MDPs) on the order of tens of percent. 


6. Bragg Reflection Polarimeters 


At energies below a few tens of keV, X-rays interact more strongly via the photo- 
electric process than via scattering. However, superposition of coherent scatterings 
off a periodic medium, such as an atomic crystal or multilayer, can produce efficient 
reflection. This process is known as Bragg reflection? and occurs when the difference 
in path length for scattering from two adjacent crystal planes, 2dsin 0, where d is the 
crystal plane spacing and @ is the angle between the incident ray and the scattering 
planes is an integer multiple, n, of the photon wavelength, A, see Fig. 10. This 
condition is known as Bragg’s law, nA = 2dsin@ or nhc/E = 2dsin@ where F is 
the photon energy. 


Fig. 10. Bragg reflection. The two outgoing waves are in phase if the difference in path length 
for scattering from two adjacent crystal planes, 2dsin 6, where d is the crystal plane spacing and 
@ is the angle of incidence, is an integer multiple of the photon wavelength, 4. 


The terms Bragg scattering and Bragg diffraction are also used. 
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Fig. 11. Polarization of 2.6 keV X-rays Bragg reflected off a graphite crystal as a function of 
incidence angle. 


Bragg reflection can be used for polarization analysis because the reflectivity 
for radiation polarized parallel to the incidence plane is close to zero for incidence 
angles close to the Brewster angle, which is (very) near 45° for X-rays. The degree 
of polarization versus incidence angle for 2.6keV X-rays reflected off a graphite 
crystal is shown in Fig. 11.9°°7 The modulation factors for Bragg polarimeters are 
typically very high and can exceed 99%. Bragg reflection polarimeters must either 
rotate, to produce a modulation curve as shown in Figs. 1 and 2, or at least three 
crystals must be used with different position angles (preferably at increments of 
45°) to instantaneously measure the Stokes parameters. 

Efficient reflection can be obtained for X-rays exactly satisfying the Bragg con- 
dition, but the efficiency drops off rapidly as the photon wavelength or incidence 
angle changes. The “integrated reflectivity” is the integral of the reflectivity, at fixed 
energy, over all angles, AO = { R(E,0)d0.*8 The effective width is the integral of 
reflectivity over all energies at fixed angle, AE(@) = f{ R(E,0)dE. The two are 
related as AE (0g) = Eg cot(0g)AO where 6g is the Bragg angle, usually 45° for 
X-ray polarimeters, and Ep is the corresponding Bragg energy. The effective width 
indicates the efficiency of a Bragg polarimeter for an astrophysical source with a 
broad spectrum. 


6.1. Bragg Polarimeters with Atomic Crystals 


The effective widths of perfect atomic crystals are typically a small fraction of an 
eV. Many different crystals! are used for Bragg reflection in laboratory and syn- 
chrotron beam experiments, but their effective widths are too small for astronomical 
applications. The best effective widths come from ideally imperfect crystals that are 
a mosaic of small crystal domains with random orientations. The crystal domains 
are thin compared with the X-ray absorption length, so an X-ray may pass through 
multiple domains until it finds one oriented to satisfy the Bragg condition. 
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The X-ray polarimeter on the OSO-8 satellite used graphite crystals with a 
mosaic spread of 0.8° and an effective width of 3eV.°° Mosaic graphite provides the 
best effective width in the standard X-ray band (2-10 keV) of any natural crystal. 
The OSO-8 polarimeter used a parabolic reflector geometry to focus X-rays onto 
a small detector in order to minimize the background counting rate.! The range 
of Bragg angles and the azimuthal extent of each reflector reduced the modulation 
factor to 0.93. The OSO-8 instrument contained two orthogonal polarimeters and 
rotated at a rate of 6rpm. Its builders obtained the most precise measurement of 
X-ray polarization of an astrophysical source to date, showing that the polarization 
of the Crab nebula at 2.6 keV is 19.2% + 1.0% at a position angle of 156.4° + 1.3°.°9 

The Stellar X-Ray Polarimeter (SXRP), built for the Soviet Spectrum 
Roentgen-Gamma mission but never flown, included a Bragg reflection polarimeter 
using a mosaic graphite crystal in the beam of an X-ray telescope. The modula- 
tion factor was measured to be 99.75%+0.11%.*° The Astrophysical Polarimetric 
Explorer (APEX) has been proposed to use parabolic graphite crystal arrays pro- 
viding a factor of 30 increase in collecting area relative to the OSO-8 polarimeter. 
The design has the advantage of a high modulation factor (92.5%) and the resulting 
(relative) insensitivity to instrumental effects, but provides measurements only in 
two narrow bands around 2.6 and 5.2 keV.** 


6.2. Bragg Polarimeters with Multilayers 


It is possible to deposit layers of atoms or molecules with thicknesses on the order 
of nanometers using sputtering or evaporation.*! By depositing alternating layers of 
high and low atomic number materials, a single high/low Z pair is a “bi-layer”, one 
can manufacture a multilayered structure, or “multilayer”, that Bragg reflects. The 
Bragg energy is set by the bi-layer thickness and multilayer reflectors are usually 
best suited for the soft X-ray (below 1keV) and extreme ultraviolet (EUV) bands. 
The reflection efficiency is set by the choice of materials, the number of bi-layers 
(typically tens to hundreds of layers are needed), and the roughness of both the 
deposition substrate and of the interface between adjacent layers. Peak reflectivities 
above 70% near normal incidence have been measured for energies near 100 eV, drop- 
ping to ~10% near 500eV.*! An extensive data base of measured X-ray reflectances 
for various multilayers is maintained by Lawrence Berkeley National Laboratory 
the Center for X-ray Optics.° The reflectance of multilayers can also be accurately 
calculated.*? 

Multilayer Bragg polarimeters use the same geometries discussed above for 
crystal polarimeters. The Polarimeter for Low Energy X-ray Astrophysical Sources 
(PLEXAS) concept used a parabolic geometry similar to that of the OSO-8 
polarimeter, but with a Bragg energy near 250eV.** The Bragg Reflection Polarime- 
ter (BRP), that was designed as part of the GEMS mission, used a flat multilayer 


“http: //henke.|bl.gov/multilayer /survey.html 
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optic in the beam of one of the GEMS telescopes to provide polarization sensitivity 
in a narrowband around 500eV.*4 

Multilayers offer more flexibility than atomic crystals. In particular, “graded” 
multilayers have a varying bi-layer thickness so that the Bragg energy varies across 
the multilayer surface. Use of a graded multilayer in a parabolic reflector can com- 
pensate for the varying angle of incidence to produce a narrow energy response. 
This offers improved background rejection since events outside the energy band can 
be rejected. 

A broadband soft X-ray polarimeter can be constructed by combining an 
energy-dispersive grating with a graded multilayer polarization analyzer.” Grat- 
ings, as described in Part 3 of this volume, diffract X-rays of different energies 
through different angles. A Bragg reflector is highly efficient only at the Bragg 
energy corresponding to the layer spacing. By using a graded multilayer, the Bragg 
energy can be tuned to vary with position to exactly match the energy versus posi- 
tion dispersion of a grating achieving high efficiency across a broad energy range. 
If the Bragg reflector is placed at an angle close to 45°, then it will be a sensi- 
tive polarization analyzer. To obtain a polarization measurement, either the full 
instrument must rotate to produce a modulation curve or at least three different 
Bragg reflectors must be used with different position angles. Calculations based 
on realistic geometries and measured multilayer reflectivities show that modulation 
factors above 50% and significant effective area can be achieved across a relatively 
broad energy band, 200-800eV.*° 


7. Outlook 


Development of new detector and optics technologies has enabled construction of 
a new generation of astrophysical X-ray polarimeters. The most exciting advance 
is the development of high-spatial-resolution gas-filled X-ray detectors and their 
demonstration as polarimeters exploiting the photoelectric effect. This technology 
offers a tremendous increase in efficiency relative to previous devices and should 
enable polarimetry of a broad range of astrophysical sources. Broadband soft X-ray 
polarimeters based on Bragg reflection are now possible due to advances in the 
fabrication of multilayer optics via deposition of nanometer thick layers of atoms. 
Developments in scintillator hard X-ray detectors has enabled construction of mod- 
ular, large area Compton scattering instruments suitable for the polarimetry of 
transient sources requiring large fields of view, while development of pixelated solid- 
state detectors allows construction of compact hard X-ray polarimeters suitable for 
use with focusing X-ray telescopes. 
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Chapter 1 


Laue Lenses in Hard X-ray Astronomy 


Enrico Virgilli 


INAF/OAS, Via Piero Gobetti, 98/3, 40129 Bologna BO, Italy 


enrico. virgilli@inaf. it 


The hard X-ray/soft y-ray domain represents a very important window to study 
the Universe. This energy band allows us to observe the most violent objects and 
matter under extreme physical conditions impossible to reproduce in the labora- 
tory. Nevertheless, observing nuclear gamma-ray lines and the continuum emission 
from celestial sources remains very challenging if done with the instrumentation 
currently in orbit. After the launch of different missions like INTEGRAL or Swift, 
to keep improving the sensitivity of the future gamma-ray astronomy telescopes, it 
seemed mandatory to change the operational principles of the telescopes operating 
in the X-/gamma-ray energy band. Current direct view telescopes are penalized 
by their modest sensitivities, which can be improved only increasing enormously 
their collection area. Laue lenses, based on diffraction from crystals in a trans- 
mission configuration, appear to be a viable method to focus the radiation from 
a few tens of keV up to ~1 MeV. Laue lenses are particularly suitable for on-axis 
sources, with the possibility to increase their fields of view with particular config- 
urations that keep their angular resolution unchanged (~10-20 arcsec) for sources 
a few arcminutes off-axis. In this Chapter, we will introduce the basic principles 
of Laue lenses, their main properties and the feasibility studies of such focusing 
instruments. We will describe the experiments realized and the developments 
obtained in the recent years. We will discuss the prospects of this methodology 
that is rapidly increasing its technology readiness level, thanks to the new progress 
made in the fields of crystallography, material science and aerospace engineering. 


1. Introduction 


X-ray telescopes can be divided into two categories according to whether or not they 
can focus the radiation. Direct-view telescopes are mainly built as position-sensitive 
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detectors placed behind mechanical collimators. One of the most common methods 
of imaging in the hard X-ray/soft y-ray sky is that of coded apertures. A coded 
mask consists of a pattern of opaque elements that block some of the radiation 
coming to a position-sensitive detector from the observed source. Such a pattern 
projects a shadow onto the detector, and through a deconvolution algorithm the 
direction of the X-ray source can be deduced. Although through the years the 
sensitivity of direct-view telescopes has been continuously improved, allowing us 
to study the deepest X-ray Universe, the development of this kind of instrument 
appears to be close to its limits. On the one hand, to increase the telescope sensitivity 
a large number of collected photons are required and, ultimately, the telescope 
collecting area must be increased. On the other hand, bigger instrumentation is 
not automatically better, since the detector noise is roughly proportional to the 
detector volume, with a consequent degradation of the signal-to-noise ratio as the 
size is increased. A new concept of telescopes is then required to overcome this 
impasse. 

Focusing optics in soft X-rays (EF ~ a few keV) has been a revolutionary tool 
for exploring new astrophysical phenomena since the late 20th century. In focusing 
optics the method is essentially based on total external reflection and the radiation 
is collected at a given distance from the collecting surface.* This distance is called 
the focal length (FL). 

Similarly to the case of the optical telescopes, the reflection of X-rays is not 
dispersive. X-ray mirrors functioning at grazing incidence can focus X-rays over a 
broad energy band, limited only by the critical angle of incidence beyond which the 
reflectivity drops significantly. The coating on the mirrors and the telescope’s focal 
length define the energy band limits of the instrument for a given dimension of the 
Wolter I configuration. In the case of Chandra, for instance, the mirrors were coated 
with Iridium and the FL is 10 m, which sets the high energy threshold to ~10 keV. 
Unfortunately, above 20 keV grazing incidence optics lose their effectiveness and the 
single layer reflection cannot be exploited. The employment of multilayers makes 
it possible to focus photons up to 80 keV. Multilayer coatings (supermirrors)! are 
composed of multiple layers of alternating high-Z and low-Z materials with graded 
spacing.» As an example, the NuSTAR mission? operates in the 3-80keV energy 
band even though the FL is the same as that of the Chandra satellite. The differ- 
ence in energy range is due to the type of coating over the mirrors. While for low 
and medium energy telescopes the focusing method has been well established and 
exploited, instrumentation for focusing hard X-rays/soft y-rays (> 80 keV) has not 
reached such a high level of development. 

Direct-view telescopes and focusing optics differ substantially in their concepts 
of collecting and detecting areas. While in the first class of instruments the photon 


*See Volume 4, Chapter 1. 
>See Volume 4, Chapter 7. 
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collection and radiation detection occur over a common surface, in the latter case 
the collecting surface (the optics) is decoupled from the sensitive area (the detec- 
tor). Mainly based on this distinction, the expressions that describe the continuum 
sensitivity for the two class of instruments are different. The following equations 
describe the minimum detectable flux for direct-view (Igy) and focusing telescopes 
(Ig), respectively: 


min = No 2B(E) 
lay (E) io Na AatobsAE’ (1) 
oe nz  /|2B(B)A 
rn(g) = ioe (2) 


7 NafeAct topsAE : 


where ng is the significance of the detection in units of the standard deviation of the 
background level, B(F) is the intensity of the background at the energy EF, tops is 
the observation length, AF is the energy resolution, 7g is the detector efficiency, f. is 
the fraction of photons that are focused on the detector within the source extraction 
region, Aeg is the effective area of the telescope optics at energy E (which is given by 
the product of the geometrical area of the optics times the efficiency), and Ag is the 
detector area (or the area of the source extraction region, in the case of a position- 
sensitive detector). A focusing telescope is more sensitive than a direct-viewing 
instrument of the same detector size by a ratio of V2f-Ack /Aa. Alternatively, for 
the same collecting area (i.e. if Ag in Eq. (1) is equal to Aeg in Eq. (2)), the 
improvement in sensitivity for focusing optics is fe\/2Ae¢/Aa. This improvement 
in sensitivity can easily exceed an order of magnitude. The main limit of a direct- 
view telescope is due to the fact that its sensitivity is inversely proportional to 
the square root of the detector area: to increase its sensitivity by a factor ~3 the 
detector area must be increased tenfold. Moreover, the detector surface cannot be 
endlessly enlarged for practical reasons. By contrast, the sensitivity of a focusing 
telescope can be enhanced by increasing the collecting area and by reducing the size 
of the focal spot. 

A definite possibility to focus the radiation above 80 keV is through diffractive 
optics. Given that the wavelength of X-ray radiation is comparable to the distance 
between the atomic planes of crystalline materials, by exploiting the principles of 
Bragg diffraction® + 
to a given direction. If a number of crystals are properly arranged, each of them 
behaves like a reflector and the common point where the radiation converges is 
the focus of the lens. Such a configuration will be extensively discussed in this 
Chapter and is called a Laue lens. Thanks to the progress made in crystal production 
and in aerospace engineering, Laue lenses appear to be a key tool to fulfill the 
need for focusing instruments in the hard X-ray/soft y-ray astrophysics, which is a 


it is possible to point a fraction of the incoming radiation 


branch rich of key questions yet without answer.° Potentially employable over a wide 
energy range (~50-600 keV), Laue lenses are expected to replace the direct-view 
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instrumentation currently operating in the same energy band with an enhancement 
by a factor 10-100 in terms of sensitivity. 


2. Laue Lens Basic Concepts 


The periodic arrangement of a perfect crystalline structure represents the basic ele- 
ment for the X-ray diffraction exploited in Laue lenses. The typical dimension of the 
atomic structure is a few Angstroms, which is comparable to the X-ray wavelength. 
Thanks to the crystal periodic structure, a mathematical-geometrical theory can be 
constructed to describe the main properties of the interaction between the crystal 
lattice and the X-rays. The first scientist who demonstrated with experiments the 
interference of X-rays passing through crystals was Max Von Laue. He also worked 
out the mathematical formulation and published the theory in 1913.°" His study 
established the electromagnetic nature of X-rays and opened the way to the later 
work of Sir William Henry and William Lawrence Bragg, who studied crystal models 
that improved the knowledge of both X-rays and crystal structure.° 

The lattice periodicity introduces a phase relationship between the scattered 
waves from different atoms placed in the crystal lattice. X-rays scattered by different 
atoms travel different optical paths to reach any given point in space, where they 
interfere with a phase relation that can be calculated from the position of the atoms 
in the crystal. If the optical paths differ by an integer number of wavelengths, the 
interference is constructive and diffraction occurs. This condition is analytically 
expressed by the Bragg equation: 

he 


2drkl sin Op =nri\= ee? (3) 


where 


dnki = Yo 
MS Viet ete 


is the distance between the lattice planes with Miller indices hkl, Vo is the smallest 
unit of volume that contains all of the structural and lattice information, the Bragg 
angle 0g is the scattering angle for diffraction, n is the diffraction order, \ is the 
wavelength of the radiation and F is the energy of the diffracted X-ray. According 
to the Bragg law, by setting an angle 0; between the incoming radiation and the 
lattice plane versor, the energy FE; will be diffracted, depending on the particular 
crystal lattice spacing (djx,). Photons having other energies will pass through the 
crystal structure without undergoing diffraction. 


(4) 


“Max Von Laue was awarded the Nobel Prize in 1914 for the discovery of X-ray diffraction by 
crystals. The following year, Sir William Henry Bragg and William Lawrence Bragg were also 
honored by the Royal Swedish Academy for their analysis of crystal structure by means of X-rays. 
The fact that for two consecutive years research on the same topic was awarded in Stockholm 
demonstrates the extreme importance of these discoveries in the X-ray field. 
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Fig. 1. The Bragg condition for constructive interference of a y-ray photon beam with the atoms 
of a given crystalline plane. Left: Bragg diffraction in reflection configuration (Bragg geometry). 
Right: Bragg diffraction in transmission configuration (Laue geometry). 


Two diffraction geometries are possible. In the Bragg configuration (reflection) 
the incident beam enters on one side of the crystal, undergoes processes of absorp- 
tion or diffraction, and the diffracted beam emerges from the same side of the 
crystal as the incident beam (Fig. 1, left). The Laue (transmission) configuration 
instead considers the incident beam entering from one side of the crystal and the 
diffracted beam emerging from the other side (Fig. 1, right). In both configurations 
the X-ray beam is partially absorbed along its path, and the absorption depends on 
the nature of the diffracting material and on the amount of material to be crossed. 
In particular, in the Laue configuration, which is the one of major interest for Laue 
lenses, the crystal thickness plays a crucial role in the optimization/maximization 
of the outgoing diffracted X-ray beam intensity (see Sec. 7). 


3. Laue Lens Geometries 


The geometry of a Laue lens must resemble that of a focusing optic. For a parallel 
photon beam, as that coming from a celestial X-ray source, and for crystals with 
mean lattice planes perpendicular to the crystal surface, a spherical surface or a 
paraboloid of revolution are the best geometrical solutions. The crystals must be 
set tangent to this surface. The positioned crystals form the Laue lens and the 
algorithm employed to geometrically arrange all the crystals in a systematic way 
is called the geometry of the lens. The right panel of Fig. 2 shows a top view of 
a simplified Laue lens where a number of crystals are set in concentric rings. In 
the profile view of the lateral section (left panel of Fig. 2) only four crystals are 
drawn for clarity, corresponding to two concentric rings. All the crystals within a 
given ring are set to the same Bragg angle with respect to the incoming radiation; 
thus all of them will diffract the same energy. Crystals corresponding to different 
rings will have different Bragg angles, and the diffracted energy will be different for 
each ring. 
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Fig. 2. Illustration of a Laue lens, which is based on the diffraction principle. Left: Lateral view of 
the Laue lens. For clarity only four crystals have been drawn, belonging to two different concentric 
rings with Bragg angles of 051 and @p2, respectively. Right: Top view of a Laue lens made with 
seven concentric rings of crystals, with two rings highlighted of radii rj and reg, corresponding 
to the diffractive angles 63; and 0g2. For symmetry, all the crystals in each ring are set to the 
same Bragg angle with respect to the incoming radiation, and therefore diffract the same energy. 
Different rings of crystals diffract different energies (see text for details). 


Following the concentric rings strategy, using crystals with size of a few cm?, a 
complete Laue lens contains several thousands of crystals. Consequently, an impor- 
tant requirement is that each crystal must be set in a relatively short time to 
guarantee a moderate mounting time for the entire lens. On the other hand, each 
crystal should be oriented with high accuracy in order to keep the total Point 
Spread Function (PSF) as small as possible. As the effective area is proportional 
to the geometric/collecting area (and thus to the number of crystals), the crystals 
should be packed as densely as possible. 

In the common and more practical case of a spherical geometry, the crystals have 
to be oriented on a frame that is a section of a sphere (a spherical cup), with radius 
R,;, so that the lattice planes of all the crystals are perpendicular to the spherical 
surface (see Fig. 3; R, is not shown in this figure). According to this geometry, the 
diffracted photons will be concentrated at the focal distance F = R,/2. 

The focal length has a determinant role for the Laue lenses, more so than for 
traditional focusing telescopes. A relation between the focal length F’ of a Laue lens, 
the radius r; from the optical axis of a ring of crystals lying on the spherical surface, 
and the diffracted energy E; for that ring can be determined from geometrical 
considerations and from Eq. (3). The Bragg angle for a given ring can be derived 
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Fig. 3. Illustration of the disposition of a single crystal set at a given radius r; from the focal 
axis. The diffraction process occurs at the angle 6g, which is the Bragg angle for diffraction of the 
energy £;. The emerging radiation at this energy is directed towards the lens focus and the angle 
between the incoming radiation and the diffracted beam is 26,. 


as a function of F and r; from: 


ri 


tan (20p;) © ra 


y—; 5 
a (5) 
where the small angle approximation is valid since (r;/F) ~ (1 —5) x 107? for 
astrophysical applications. By using Eq. (5) in Eq. (3) for first-order diffraction, 
with some approximation (sin 0; ~ 0;) we obtain 


Me) ee (6) 


© dirt Bj 
From Eq. (6) it is clear that the radius of the ring of crystals is inversely proportional 
to the energy to be diffracted; therefore the arrangement of the crystals is such that 
crystals placed in inner rings are responsible for the upper energy threshold of the 
lens while the outer radius limits the minimum detectable energy. 
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4. Mosaic Crystals 


Perfect crystals are not suitable for Laue lens applications because their response to 
a polychromatic beam is a delta function. In other words, a perfect crystal precisely 
follows the Bragg law, and behaves as a monochromator. By contrast, the goal of 
a Laue lens is to converge as much radiation as possible into the lens focal point. 
Even for the case of a Laue lens dedicated to the observation of a given nuclear line 
(see Sec. 8), perfect crystals do not represent a feasible solution. 

Experiments show that the X-ray diffraction theory in perfect crystals is in 
agreement with the experiments concerning the geometrical description of the pro- 
cess. However, there is a net disagreement between them when the theory is used 
to describe the diffracted beam intensity profile. Almost all the deviations from the 
theory are justified by physical processes not considered in modeling the perfect 
crystals. For instance, phenomena intrinsically related to the structure of a real 
crystal, like thermal motions or lattice defects with respect to a perfect structure, 
can strongly modify the intensity profile of the diffracted beam without affecting 
its direction of propagation. Real defects inside a crystalline structure can be due 
to atoms of different material present in the lattice in place of the main element, or 
atoms outside of the crystal lattice. Thermal motion introduces different relation- 
ships in the interference between the diffracted rays, creating local phase variations. 

Different models have been proposed to explain the discrepancies between the- 
ory and experiments. The model that most successfully described a particular class 
of crystals is the one proposed by C.G. Darwin®:? and concerns a particular kind of 
positional defects that are present in a real crystal. The crystals that he described 
are known as mosaic crystals. The main property of a mosaic crystal is that it is com- 
posed of small blocks, called crystallites or micro-blocks (Fig. 4). Each micro-block 
can be considered as a perfect crystal; therefore a mosaic crystal can be considered 
locally to be a perfect crystal. The crystallites have size approximately varying from 
1 to 100 wm. Such a dimension is large compared to the X-ray wavelength of the 
energy of interest (~100-500keV), which is ~0.1A. Therefore, each micro-block 
appears to the X-ray radiation as a perfect crystal. Under these assumptions the 
theory of perfect crystals can be applied for each micro-block, and there is no phase 
relationship between the radiation coming from different micro-blocks. 

In crystals there are two kinds of processes that subtract energy from the inci- 
dent beam. The first is due to an inelastic scattering between the photons and the 
matter. The photon can give part of its energy to the electrons, and the resulting 
photon has a lower frequency. This is an incoherent process as there is no relation 
between the frequency of the incoming photon and that of the resulting photon and 
is mainly caused by the photoelectric or Compton effects, and results in a beam 
intensity reduction. 

The diffraction process is instead a coherent elastic scattering of X-rays by 
atoms of the crystal and there is a precise relation between incoming and diffracted 
radiation. Depending on the properties of the lattice structure of the crystal, the 
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Fig. 4. Left: Illustration of the primary extinction effect occurring in a perfect crystal when there 
is destructive interference between the incident beam (or transmitted beam) and the double (or 


multiple) diffracted beams, which have opposite phases ®. Right: Illustration of a mosaic crystal, 
which is composed of misaligned perfect crystals called micro-blocks or crystallites. 


diffraction process itself can reduce the intensity of the diffracted beam through the 
so-called extinction processes that can be categorized as primary or secondary. 

Primary extinction occurs in crystals with a perfect lattice structure. In an 
ideally perfect crystal a diffracted beam strikes a neighboring lattice plane at the 
proper Bragg angle such that a further diffraction occurs. Each reflection generates a 
phase change of 7/2, so that after two diffractions the primary beam and the double- 
reflected beam propagate in the same direction but with opposite phases, resulting 
in destructive interference. A triply reflected ray travels in the same direction as the 
single-diffraction ray but is again of opposite phase and therefore tends to reduce 
its intensity. The same effect is present in mosaic crystals within each micro-block, 
as they behave as perfect crystals. The larger the dimension of the perfect crystal, 
the more intense is the primary extinction. Alternatively, the smaller the micro- 
block size, the more negligible is the primary extinction, since double or multiple 
diffractions are infrequent in small micro-blocks. The primary extinction does not 
contribute to the diffraction process but, instead, causes a decrease in the intensity 
of both the incident and diffracted beam. 

On the other hand, in a mosaic crystal each micro-block diffracts locally depend- 
ing on its diffraction angle. In mosaic crystals multiple reflections may occur, but the 
phase relationships which led to destructive interference for both transmitted and 
reflected beams in the case of a large perfect crystal no longer occur, because of the 
breaks in the periodic structure. Nevertheless, if the micro-block misorientation is 
not sufficiently large, the incident beam interacting with the crystallites positioned 
deeper in the crystal is weakened due to the diffraction by the shallower micro- 
blocks. This effect, which is called secondary extinction, becomes negligibly small 
when the misorientation between the micro-blocks is sufficiently large. 
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Therefore, the smaller the primary and secondary extinction are, the more effi- 
cient is the diffraction process, maximizing the crystal reflectivity. This condition is 
verified for the so-called ideally imperfect crystals, where the micro-block dimension 
is very small (~jm) and the spread of the micro-blocks’ misorientation is relatively 
large. 

The function that describes the misorientation of the crystallites around a mean 
orientation is usually known. The commonly used distribution function adopted 
to describe the misalignment of the crystallites around the mean orientation is a 
Gaussian function: 

21 oS (7) 
7 V2nn 

where po is the mean direction of the crystal orientation and 7 represents the spread 
of the misalignment around a9. A common quantity that is typically used to char- 
acterize the spread of orientations for the micro-blocks is the Full Width at Half 
Maximum of the Gaussian (FWHM): 


W (zo) 


FWHM = 2V2In 2n = 2.35n, (8) 


which is also known as the mosaic spread of the crystal, or mosaicity 3. The inte- 
grated reflectivity of a mosaic crystal can be determined in terms of the secondary 
extinction coefficient and absorption coefficient as 


1 _ a 
Rnwi(E) = 5 (1-e "Je ee (9) 


where o is a function that depends on the micro-block misalignment distribution 
and on the parameters of the crystal lattice, w(E) is the absorption coefficient at 
energy E, T is the crystal thickness and 9 is the angle between the direction of the 
photons and the normal to the crystal surface. 

Depending on the particular energy to be diffracted by a crystal, the crystal 
passband changes. By differentiating the Bragg law, the passband of a crystal is 
found to be proportional to the square of the diffracted energy: 


— 2daxiE? 


dE 
he 


dé, (10) 


where dE is the energy passband of the crystal and dé@ is the angular acceptance of 
the crystal, which for a mosaic crystal corresponds to the mosaicity, /. 


5. Bent Crystals 


Crystals with curved planes, similarly to mosaic crystals, can be employed for Laue 
lenses. Two kinds of curvatures, that are conceptually different, can be involved 
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in a bent crystal. A curvature of the external surface can be induced through an 
external strain (also named focusing curvature). The external curvature of a bent 
crystal provides two advantages. First, it enlarges the passband of a crystal, since the 
diffraction plane directions continuously change along the crystal surface. Second, if 
the external curvature radius of the crystal is two times the focal length, the dimen- 
sions of the diffracted beam are smaller than that of the incoming beam. External 
curvature can also be impressed onto mosaic crystals, with the interesting result 
of having a Gaussian distribution around the main direction that is continuously 
varying along the crystal direction of curvature. This is similar to the case of bent 
perfect crystals but differs from the flat mosaic crystals (see Fig. 5). 

In addition to the primary external curvature, a secondary curvature can be 
present if the diffraction planes are also bent (such a secondary curvature can be 
induced by the external curvature). Unlike the primary curvature, the secondary 
curvature increases the angular acceptance of the crystal locally. For some classes 
of bent crystals the concept of quasi-mosaicity has been introduced, similarly to the 
mosaicity of mosaic crystals.! 

Bent crystals can be produced in a variety of methods.'! Curved diffractive 
planes are obtained by growing a two-component crystal, where the relative con- 
centration of the two components changes as the crystal is grown. Examples of 
such crystals are produced at the Institut fiir Kristallziichtung (IKZ) of Berlin, 
where Si-Ge crystals were grown with starting concentrations of 3-7% percent of 
germanium.’ Perfect Si crystals were bent by applying a thermal gradient to a 


diffractive planes 


a sai an bent surface 


Fig. 5. A crystal in which an external curvature Rext (also called primary curvature) has been 
applied. Such an external bending produces a passband enlargement of a crystal. In some diffraction 
configurations the primary curvature induces a secondary curvature (the curvature of the diffractive 
planes) that results in an increasing angular acceptance of the crystal, similar to the mosaic spread 
of flat mosaic crystals. 
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single structure crystal.!t Mechanical bending can be more practical for astrophys- 
ical applications,!? given that a self-standing curvature does not require additional 
power. Self-standing bent Si and Ge perfect crystals were produced and character- 
ized at the Sensor and Semiconductor Laboratory (Ferrara, Italy) by mechanical 
grooving of one of their surfaces.‘4 Although a good uniformity was achieved, the 
grooving process caused damage in the crystal. As a consequence, the crystal itself 
suffered from a high mechanical fragility and its diffraction efficiency is limited 
to 50% if the diffraction occurs in the layers beside and beneath the grooves.!° 
A self-standing curvature obtained by the Istituto Materiali per Elettronica e Mag- 
netismo (Institute of Materials for Electronics and Magnetism, IMEM-CNR) in 
Parma!® relies on controlled mechanical damage on one surface of the sample. 
The procedure introduces defects in a superficial layer of a few microns, provid- 
ing a highly compressive strain responsible for the convexity appearing on the 
worked side. 

Another very promising family of crystals that overcomes the limitations of 
flat crystals is based on the so-called Silicon Pore Optics (SPO),!” a technology 
selected for the Athena mission being developed by the European Space Agency. 
The crystals (called Silicon Laue Components, SiLCs!*), which are being developed 
by the COSINE company (The Netherlands) in collaboration with the University of 
California at Berkeley, result in self-standing diffracting elements that can focus in 
both the radial and azimuthal directions. 


6. Flat Crystals vs. Curved Crystals 


Flat crystals have the advantage of relatively easy production in large quantities 
with good reproducibility in terms of dimensions and mosaic spread. Unfortunately, 
a flat crystal (either mosaic or perfect) diffracts the radiation to a focal spot that 
is comparable in size to its crystal cross-section. Indeed, each point of the crystal 
surface diffracts the radiation with the same Bragg angle. Moreover, the mosaic 
spread of a mosaic crystal results in a defocusing effect!® that further enlarges the 
PSF. The sensitivity of a Laue lens strongly depends on the photon distribution on 
the focal plane; therefore, crystals capable of focusing the photons into a small PSF 
are preferable. Bent crystals can provide such a focusing effect along the direction of 
the primary curvature. Indeed, the external curvature makes bent crystals focus the 
radiation into a region at the focal plane detector that is smaller than the crystal 
length. Such a focusing effect has been observed and studied for bent crystals with 
both perfect and mosaic structure.?? 

Monte Carlo simulations of Laue lenses made with flat or bent crystals have 
been made and compared to experimental results. With bent crystals the total PSF 
is very sharp, with a great advantage in terms of angular resolution and sensitivity. 
Assuming a lens with a focal length of 20 m made of crystals with a mosaic spread of 
10 arcseconds, the PSF obtained in the case of 30 x 10mm? flat mosaic crystals and 
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Fig. 6. Simulated on-axis PSF of a lens with a 20-m focal length made of mosaic crystals with 
mosaicity of 20 arcseconds and cross-section of 30x 10 mm?. Left: Flat crystals. Right: Bent crystals 
with 40 m curvature radius. (See electronic edition for a color version of this figure.) 


that obtained in the case of curved crystals with a curvature radius of 40m (2 times 
the focal length) are shown in Fig. 6. There is a remarkable difference between the 
two PSFs. In the case of curved crystals the expected angular resolution is a few 
tens of arcseconds and the sensitivity is at least 10 times better than that of a Laue 
lens made of flat crystals. 

Another limit of flat crystals is that their maximum reflectivity is limited to 
50% of the incoming flux (see Eq. (9)), which is the result of a radiation equilibrium 
between direct and diffracted beams from the crystalline planes.?! Such a value 
is a theoretical limit even without taking into account the incoherent absorption 
inside the crystal itself. For bent perfect crystals, on the other hand, it has been 
demonstrated that the secondary curvature increases their efficiency to values higher 
than 50% of the incident radiation.?? For bent mosaic crystals there is also evidence 
that the external curvature enhances their efficiency with respect to the equivalent 
flat mosaic crystals.?3 


7. Crystal Production and Material Selection 


For a successful Laue lens the main requirement is to find the best crystals in terms 
of efficiency of diffraction. Secondly, an attractive feature is the capability to focus 
the radiation into a detector area possibly smaller than the crystal cross-section 
of the basic diffractive element. In fact, for the optimization of a Laue lens the 
selection criteria of the crystal material and lattice planes can change depending on 
the requested lens passband. The crystals should be easily available, and it should 
be fast and relatively cheap to produce crystals and cut them into tiles of suitable 
size. In practice, the materials that seem most suitable for Laue lenses are single 
component crystals and a few two-component crystals. A few materials, such as 
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Fig. 7. Peak reflectivity for a selection of crystal materials at three different energies: 150 keV, 
511 keV and 847 keV. A mosaic structure is assumed for the crystals, with a mosaicity of 30 arcsecs 
(FWHM of the angular distribution of the crystallites orientation) and a crystallite size of 30 um, 
604m and 90um at 150 keV, 511 keV and 847keV, respectively. For each point, the thickness of 
the crystal is optimized to maximize the reflectivity, within the range 2-15 mm. The color of the 
labels gives a qualitative indication on the quality and homogeneity of the samples that were tested 
(looking for crystals with mosaicity of the order of 0.5 arcmin to 1 arcmin): Gray: no sample tested; 
Green: most samples of good quality; Yellow: mixed results; Red: most samples of poor quality. 
Updated version of the plot from Ref. 24 and kindly provided by the Authors. (See electronic 
edition for a color version of this figure.) 


copper, germanium, silicon, and gallium arsenide, have already been tested in a 
number of prototypes (see Sec. 9) and have shown good performance that fit the 
requirements of a Laue lens. Figure 7 shows the peak reflectivity as a function of the 
crystalline material, sorted by increasing Z for three different energies.** It is clear 
that low-Z elements are good candidates for the low energy range (~150 keV), witha 
reflectivity almost reaching 40% of the incident beam. By contrast, high-Z elements 
work better for y-rays (500 keV—1 MeV), even though their peak reflectivity drops 
significantly to values of ~20-30%. 

Another parameter that determines the efficiency of a Laue lens is the crys- 
tal thickness. In the case of a mosaic crystal, the optimum crystal thickness that 
maximizes the reflectivity can be determined by differentiating Eq. (9). For various 
materials, the estimate of the best thickness as a function of the energy is shown in 
Fig. 8. 
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Fig. 8. The best crystal thickness that maximizes the crystal reflectivity, for various materials. 
The mosaicity assumed is 1 arcminute, the crystallite thickness is 1 wm, and the crystal plane 
chosen for all of them is (111). 


8. Narrow- and Broadband Laue Lenses 


Laue lenses for space observation can be classified into two main categories. These 
two classes of lenses require different criteria in the crystal choice and disposition 
in the lens for optimal performance. Broad passband Laue lenses cover a wide 
energy band (e.g. 100-600keV) for the study of continuum source spectra. Low 
energy broad energy and band Laue lenses were already realized and tested in the 
sixties when an array of rock salt crystals (1 inch square cross-section each) was 
mounted on a 6-ft diameter frame and set to diffract photons in the 20-140keV 
energy range.”° For broadband Laue lenses it is crucial to select different kinds 
of crystals to cover more efficiently the entire passband of the lens. Assuming a 
lens made with a single specimen, the crystals are placed in concentric rings. For 
azimuthal symmetry and for an on-axis source of radiation, the crystals belonging 
to the same ring, 7;, will diffract the same energy E;. Different concentric rings 
focus slightly different energies because of the varying Bragg angle. For a crystal 
belonging to a ring rj; < r; a crystal with the same dpxi will diffract an energy 
E; > E; according to Eq. (6): 


a (11) 


Examples of simulations of Laue lenses for focusing photons in the 100-600 keV 
range, made of a large number of crystals with different lattice planes involved at 
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different ring radius and a focal length of 20 m, can be found in the literature.?° 
Unfortunately such complex structures have been designed but not yet realized. 

In contrast to broad passband Laue lenses, narrow passband Laue lenses 
achieve an optimal sensitivity in a relatively tight energy band (e.g. 800-900 keV) 
for y-ray line spectroscopy. If a lens is made of concentric rings, crystals placed at 
different ring radii must diffract the same energy to the focal point, even though the 
diffraction angle must change along the lens radius. For a fixed value of the focal 
length F' and a desired diffraction energy Eo, and keeping in mind the definition of 
the d-spacing (Eq. (4)), we use Eq. (6) to obtain 


r(Eo) « _ x Yh? +k? +1. (12) 


hkl 


Thus, a crystal with lattice spacing di,,, belonging to the ring r; will diffract the 
energy Ho. A crystal set at a ring radius r; > ri, according to Eq. (12), will focus 
the energy Ep at the correct focal position only if its d-spacing ae, is smaller than 
di, ,, or if larger Miller indices are used. It is worth noting that the outer rings will 
have larger contributions to the total effective area, since the number of crystals 
increases with the ring radius. 


9. Status of the Development of Laue Lenses 


Laue lenses are being developed by several groups in the high energy astrophysics 
community. To achieve the goal of building a Laue lens, all the groups have faced 
similar challenges during the development phase. The first aspect to be tackled is 
the development of a technology for the production of a large quantity of crystals. 
The crystals must have the required properties after the different phases of their 
production: the growing phase must produce the needed mosaicity; the cutting 
phase sets the proper dimensions of each crystal, and the error between the external 
surfaces and the diffraction planes direction must be measured; and the bending 
process impresses the correct curvature radius in case of bent crystals. Another 
very important challenge is the development of a technology to assemble a large 
number (few thousands) of crystals in a reasonable short time. Once the crystals 
are selected and the method has been implemented, the critical problem is the 
crystal tile positioning accuracy, and its stability over time. Long focal length Laue 
lenses require a higher accuracy technology to set tiles than for Laue lenses with 
short focal length. 

Laue lenses with narrow energy passband have already been built and demon- 
strated in a balloon flight experiment that has flown twice (2000, 2001) as part 
of the CLAIRE project?”?8 (see Fig. 9). The CLAIRE Laue lens was composed 
of Ge-Si mosaic crystals to focus into an area of ~1.5cm? of a small solid-state 
detector with only 18cm? equivalent volume for background noise. The 511 cm? 
lens collecting area was able to successfully focus photons with energy ~170keV, 
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Fig. 9. The CLAIRE telescope during the 2001 balloon campaign. The instrument features a 
Laue diffraction lens, a 3 x 3 array of cryogenic germanium detectors, and a balloon gondola 
stabilizing the lens to a few arcseconds. The telescope frame was essentially made of carbon fiber 
bars with aluminum connections such that the total weight of the payload was less than 500 kg. 
Published with permission from Ref. 29. (See electronic edition for a color version of this figure.) 


with a passband of ~3keV, with a field of view (FOV) of 90 arcseconds and an 
angular resolution of 25-30 arcseconds, which was dictated by the mosaicity of the 
selected crystals. The fine tuning of the lens consisted of tilting each crystal tile to 
the appropriate Bragg angle until the diffracted energy was detected for each tile. 
The CLAIRE project was a development phase for the satellite mission MA X,?9 
which was submitted to the French Space Agency in 2006. The MAX mission was 
conceived to investigate two energy bands (460-522keV and 825-910keV) mainly 
looking for electron—positron annihilation lines and nuclear emission lines from cos- 
mological sources. In the preliminary idea the MAX lens was composed of about 
13,700 copper and germanium crystals with mosaicity of ~30 arcseconds, distributed 
on 36 concentric rings with radii from 57 to 128 cm. Unfortunately, the mission was 
not accepted and was never realized. The main critical point for the MAX mission 
was the large focal length (86 m), which required a formation flying mission. Forma- 
tion flying consists of two separated spacecraft that fly at a fixed distance (the focal 
distance); this is a technology still very hard to realize with the required accuracy. 
The Laue lens is placed on the first spacecraft, while the focal plane detector is 
placed on the second. The relative position of the two satellites is kept stable and 
precise by a feedback control based on a laser tracking system. The same pointing 
method was also suggested for the 35 m focal length of the XEUS observatory®° 
developed by the European Space Agency (eventually not realized and now merged 
into the Athena mission with a 12-m focal length), and for the Simbol-X mission 
(energy passband of 0.5-80keV with a 20-25m focal length) jointly supported by 


18 E. Virgilli 


the French and Italian space agencies, which was planned but not realized due to 
budget restrictions. 

A narrowband Laue lens with excellent sensitivity to a single energy might be 
considered too limited for a space mission, even if it produced excellent scientific 
results for a particular phenomenon (e.g. the 511 keV e*/e~ annihilation line or 
nuclear lines from SN explosions). For this reason, the French space agency devel- 
oped an R&D project for an adjustable Laue lens capable of switching between 
different diffracted energies. This requires a system that can modify both the Bragg 
angle of the crystals and the focal length of the lens, depending on the energy 
to be diffracted. An automatic mechanism was developed to achieve this goal, by 
means of piezo-electric actuators with a reproducibility in terms of crystal tilting 
better than 1 arcsecond. The prototype was developed and successfully tested using 
different energies from '°7Cs and '°°Ba radioactive sources by switching between 
focal lengths of 24.75 m and 24.45 m, respectively.?! 

Starting from the acquired experience and the successes of CLAIRE, the Space 
Sciences Laboratory of UC-Berkeley invested in research to find a better solution 
for Laue lens technology. A dedicated beamline and assembly station were realized 
and successfully employed to fabricate small prototypes composed of flat mosaic 
crystals made of different materials. The prototypes reached an extremely good 
accuracy in the crystal alignment (better than 10 arcseconds) in a short mounting 
time with very dense packing. The last prototype consisted of 48 crystal tiles with 
cross-section 5 x 5mm? made of Iron (36 tiles) and Aluminum (12 tiles).?? The 
crystals were distributed in 8 arcs of rings and set to diffract energies in the range 
95-130 keV at 1.5m from the lens, with the source of radiation placed at 12.5m on 
the other side of the lens. 

A big effort was also made in Italy by the high energy astrophysics group of 
Ferrara. In the framework of the HAXTEL project, two prototypes of Laue lenses 
were built to demonstrate the feasibility of building short (< 10 m) focal length 
lenses. The HAXTEL project,®*> *° supported by the Italian Space Agency (ASI), 
was devoted to focusing high energy photons by exploiting diffraction from 20 flat 
crystals tiles made of copper (111) provided by the Institute Laue Langevin (ILL) 
of Grenoble (France). The tiles, with cross-section of 15 x 15mm? and 3mm thick, 
were positioned on a ring of 36cm diameter (Fig. 10). The crystals were set at the 
Bragg angle for 100 keV X-rays and were glued onto a carbon fiber frame using a 
two-component low-shrink epoxy adhesive. An X-ray tube (with output energy up 
to 150 kV) was positioned at a distance of 6m from the lens and the radiation was 
refocused onto a detector 6 m from the lens. The assembling method, based on the 
determination of the diffraction planes of each crystal tile and on a subsequent posi- 
tioning over a steel countermask, requires several steps and has a final error budget 
of a few arcminutes. The method is now patented and the accuracy is sufficient for 
short focal length ($10 m) applications like balloon experiments, but also for Laue 
lenses in ground applications such as medical imaging and treatment. 
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Fig. 10. Left: One of the Laue lens prototypes made within the HAXTEL project, supported by 
the Italian Space Agency. The prototype was composed of 20 flat crystals made of copper glued 
over a carbon fiber frame 2mm thick. To ensure the stiffness of the frame, ribs and backing were 
applied to its back side. The lens was made to focus 100 keV radiation from an X-ray source placed 
6 m from the Laue lens. Right: The PSF of the second prototype measured at the focal distance of 
the Laue lens of 6 m. The white circle represents the area in which all the photons would have been 
directed in case of a perfect mounting of the crystals, without uncertainties and misalignment. 


Figure 10 shows the second prototype and its focal spot (PSF), with the white 
circle indicating the distribution of the photons in case of a perfect crystal mounting. 
The halo surrounding the white circle is due to misalignment of the crystals during 
the assembly process. The disagreement between the measured and the expected 
PSF is also apparent by comparing the cumulative distribution of the photons with 
the distance from the focus (Fig. 11), where the collected photons of the two proto- 
types have been compared with the theoretical expectations. At the radius where the 
theoretical curve saturates (16mm), 50% of the incoming photons were collected 
by the first prototype, while for the second demonstration model the fraction of 
photons collected at the same radius was ~70%. 

After the success of HAXTEL a more difficult challenge was undertaken by the 
group of the University of Ferrara, which developed a facility to build Laue lenses 
for long focal lengths (> 20 m). The LARIX facility®® has been extensively described 
in several papers devoted to the design of Laue lenses, detector calibration, and 
a transparency test for different high energy instruments.*” °° A dedicated facility 
was required for several reasons. First, the accuracy obtained with the HAXTEL 
technology was not sufficient for a long focal length Laue lens designed for a hard 
X-ray space mission. A different method was required to set the crystals in a rea- 
sonable short time. Secondly, a long beam line was needed to minimize the beam 
divergence and better simulate the condition of an X-ray beam without divergence, 
as in the case of an astronomical source of radiation. In the framework of the LAUE 
project a 100-m tunnel was used with a 26-m vacuum beam line. The goal of the 
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Fig. 11. Cumulative distribution of the focused photons along the radial distance from the focal 
point for the HAXTEL project. The black line corresponds to the expected distribution in the 
case of a perfect alignment of the crystals. The red line shows the photon distribution obtained in 
the first prototype?* while the blue line shows the photon distribution for the second prototype.®° 
(See electronic edition for a color version of this figure.) 


LAUE project was to build a petal of a large Laue lens made of germanium (Ge, 
planes 111) and gallium arsenide (GaAs, planes 220) under illumination by an X-ray 
source providing photons from a few to ~320 keV. The energy passband of the petal 
is in the range 90-300 keV, where the upper limit is defined by the maximum volt- 
age of the X-ray tube. Nevertheless, with an appropriate X-ray source the modular 
approach allows one to build portions of a Laue lens working up to 600 keV, which 
are subsequently assembled off-line (see Sec. 10). 

The details of the realization method can be found in the literature.4° The 
main idea is to simulate a source coming from an infinitely far distance (i.e. without 
divergence) with a movable pencil beam which is always parallel to itself. The pencil 
beam is then used to illuminate each crystal tile, which is tilted to the correct 
angle such that X-rays of the nominal energy are diffracted to the correct position. 
The pencil beam was realized by mounting the X-ray source and a collimator onto 
movable stages that shift them in the z—y plane, perpendicular to the lens axis. The 
X-ray pencil beam is kept parallel to itself and to the lens axis during the whole 
alignment of the lens elements. Each crystal is set onto the transparent substrate by 
using a high precision mechanical positioner. Thus, the method consists of a single 
phase of positioning and freezing each crystal at the proper diffraction angle, unlike 
in the HAXTEL project where the crystals setting was obtained with a multiple- 
phase process. Such a difference drastically reduces the assembling uncertainties in 
favor of the LAUE method. Figure 12 shows the design of the Laue lens that is being 
built and the equipment that is being used in the facility. 
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Fig. 12. Center: The Laue lens consisting of 274 crystals made of germanium and gallium arsenide 
designed within the LAUE project. a: The tungsten collimator set in front of each crystal to limit the 
X-ray beam to the crystal dimensions and avoid undesired diffuse photons. b: The X-ray imager 
placed at the focal distance of 20 m from the lens, used to collect the diffracted photons from 
the Laue lens. c: The petal substrate (in the picture a preliminary version made of carbon fiber) 


and the hexapod used to align and set each crystal to the correct Bragg angle before being glued. 
d: The 26m beam-line under vacuum to avoid photon absorption along the first part of the beam 
path from the X-ray source to the detector. 


A number of adhesives and substrates have been tested in order to minimize 
the assembly uncertainties, which are mainly due to adhesive shrinkage during the 
polymerization phase. The use of a single component adhesive with UV radiation or 
thermal activation makes the assembly process simpler than using a two-component 
adhesive. The most suitable type of adhesive was found to be a group of single 
component pastes cured with optical and UV light. Their extremely low shrink- 
age coefficient (<0.08%), low outgassing, and low CTE provide an ideal interface 
between crystals and substrate. The substrate material also has a fundamental role 
in the assembly process for the correct mechanical and thermal coupling with both 
the crystals and the adhesive. A polymethyl methacrylate (PMMA) substrate has 
been utilized due to its high transparency to the UV light required for the adhesive 
polymerization. Its low density also helps to keep the total weight of the optics low. 
Figure 13 shows a mock-up with 11 crystals mounted over the PMMA support, while 
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Fig. 13. Left: Picture of a LAUE project prototype made of 11 GaAs(220) bent crystals fixed on 
an UV transparent PMMA flat frame. Right: Misalignment with respect to the ideal position for 
the 11 GaAs crystals composing the mockup model after the procedure of gluing over the PMMA 
support. 


the plot shows the errors during the positioning of the tiles. With these materials 
the crystal stability was better than 30 arcseconds over time scales of 6 months. To 
further increase the accuracy, glass and quartz substrates have been tested. Their 
lower CTE makes the mounting accuracy better (10-15 arcsecs) with a substrate 
3mm thick. A drawback in the latter case is the higher radiation absorption of the 
quartz glass with respect to the PMMA, especially at the low energy passband of 
the Laue lens technology (80-100 keV). 


10. Open Issues 


The most important issue to be faced while building a Laue lens is to find a method 
to assemble a large number of crystal tiles. It is crucial to keep the position of each 
tile fixed, such that the relative position between the X-ray beam and the diffractive 
planes does not change within the required precision. Thus, the problem becomes 
how to fix all the diffractive elements to a common support. In some prototypes 
the method used to set the crystals was through micrometers. This method has 
good stability, but screws and holders for each crystal increase the total weight 
of the lens enormously, with a consequent possible deformation of the lens itself. 
Other methods consider crystal positioning by micro-actuators. This is definitely the 
most accurate method, given that piezo-electric actuators can reach submicrometric 
accuracy in terms of positioning lightweight elements. Nevertheless, also in this case 
the mounting is not practical as it requires two or three piezo-actuators to set the 
needed angles for each crystal. Furthermore, the power required to keep thousands 
of crystals stable is too high and may be impracticable. 

The method that has been employed recently by all the research groups is to 
use adhesive to bond the crystals to a frame. This method is surely the easiest in 
terms of mounting time but is also the most dangerous, given that the positioning 
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Fig. 14. Concept of modularity suited for a mission based on Laue lenses. The petals are 
separately built, then assembled together and kept aligned each other with the employ of 
piezo-actuators. 


is permanent and cannot be corrected at a later time. Indeed, the adhesives suffer 
from an unavoidable shrinkage during the polymerization phase that is of the order 
of 0.01-1% in volume. Such a shrinkage results in a non-systematic shift of microns 
that, particularly for long focal length, degrades the PSF. 

A very important aspect of the Laue lens is its natural modular approach. The 
basic elements (crystals) can be aligned to form a cluster of crystals targeted to focus 
a particular passband. Then different clusters of crystals can be joined together, 
allowing minimization of the alignment phase error budget. Eventually, the clusters 
can be fine tuned with an adaptive optic technology, to form an entire Laue lens. 
A similar method is also being used in the approved Athena mission, which uses 
so-called Silicon Pore Optics (SPO) modules dedicated to the 0.1-12keV energy 
passband. Figure 14 shows an entire Laue lens made of 8 petals. The petals are 
positioned on a frame and are tilted to the common direction of diffraction. 


11. Conclusions 


In the last 15 years the X-ray/y-ray astrophysics community has strongly invested in 
the development of focusing Laue lenses for y-ray astronomy (>70—100 keV), which 
is currently the only suitable technique to enable focusing optics in this energy 
range. The experimental results shown in this chapter have demonstrated that short 
focal length Laue lenses have already achieved a satisfactory scientific/technological 
readiness. Thanks to the most recent developments, Laue lenses with short focal 
length (10-15 m) have been already tested with balloon experiments or in on- 
ground tests with prototypes. One of the most important tasks is the production 
of a large number of suitable crystals with high diffraction efficiency. Such crystals 
will optimize the effective area of the Laue lens and, eventually, the instrument 
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sensitivity. The major limit to the launch of a Laue lens y-ray telescope has been 
the need for long focal lengths (20-100 m), which implies the use of two different 
satellites in a formation flying, one for the lens, and the other for the focal plane 
detector. However, the development of extendable booms up to 20 m long and the 
optimization of the lens effective area make possible the realization of broadband 
satellite missions, where Laue lenses could join the current low-energy focusing 
instruments. Such a broadband fully-focusing mission will be a breakthrough for 
high-energy astrophysics. 
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Scintillation detectors have been used for detecting X-rays and y-rays for more 
than 60 years. Their ease of fabrication in a variety of shapes and sizes, their 
relatively low cost, their ease of use, and their reliability have made them an 
invaluable component for a wide variety of high energy astrophysics experiments. 


1. Introduction 


For more than 100 years the detection of ionizing radiation has been facilitated 
through the use of various types of scintillators. Scintillators convert the energy of 
ionizing radiation into light through a process known as photoluminescence. The 
total light output provides a measure of the total energy deposited in the scintillator 
and can easily be measured using some type of photosensor (such as a photomulti- 
plier tube). 

Scintillation detectors have proven immensely valuable for applications in a wide 
variety of fields, including medical imaging, homeland security, nuclear physics, 
and astrophysics. Within the realm of astrophysics, they have been employed in 
several different ways, as important components of experiments studying cosmic 
rays, celestial X-rays and y-rays, and dark matter. They serve both as primary 
detectors and as anti-coincidence detectors that are needed to reduce undesirable 
background components. 

Nearly every orbiting y-ray mission has employed scintillation detectors. This 
includes all four instruments on the Compton Gamma-Ray Observatory (CGRO): 
the Burst and Transient Source Experiment (BATSE),' the Oriented Scintillation 
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Spectrometer Experiment (OSSE),? the Compton Imaging Telescope (COMP- 
TEL),® and the Energetic Gamma Ray Experiment Telescope (EGRET).* It also 
includes the two y-ray instruments on the International Gamma Ray Astrophysics 
Laboratory (INTEGRAL): the Imager on Board INTEGRAL (IBIS)° and the Spec- 
trometer aboard INTEGRAL (SPI).° They are also vital components of both instru- 
ments on the Fermi Gamma Ray Space Telescope: the Large Area Telescope (LAT),” 
and the Gamma-ray Burst Monitor (GBM).® Scintillators have also been used on 
a variety of solar science missions, most notably the Gamma Ray Spectrometer 
(GRS) on the Solar Maximum Mission.® 

Although scintillators can be used for measuring any type of ionizing radia- 
tion, including charged particles and neutrons, we will focus our discussion on the 
detection of X-ray and y-ray photons. A detailed discussion of scintillation detector 
design and application is well beyond the scope of this chapter. It is hoped that this 
will provide a sufficient overview for more detailed study. The interested reader is 
referred to several excellent texts that exist, most notably the ones by Birk,!° by 
Knoll"! and by Tsoulfanidis and Lansberger.!? Useful literature can also be obtained 
from many vendor web sites. 


2. Principles of Scintillation Detection 


The operation of a scintillation detector involves four essential steps. First, some 
ionizing radiation deposits energy in the scintillator material. Second, some fraction 
of this energy goes toward populating excited states of the electron structure of the 
scintillation material, the precise nature of the excitation process depending on the 
type of material. Third, the excited states decay, with associated optical emissions. 
Finally, the optical emission is converted into an electrical impulse using some type 
of photosensor. 

The properties of an ideal scintillator include (but are not limited to) the 
following: 


(1) The scintillator must be very efficient at converting charged particle energy into 
detectable light. 

(2) The light yield (the amount of optical light generated by the scintillation pro- 
cess) must be linearly proportional to the energy deposited in the scintillator. 

(3) The scintillator should be transparent to its own emissions so that the light can 
be efficiently collected. 

(4) The decay time of the light output should be short so that fast signal pulses 
can be generated. 


3. Energy Deposition 


Energy deposition in a scintillator ultimately takes place by ionization losses of 
charged particles. The incident quantum can either be a charged particle, in which 
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case the energy is deposited directly by ionization losses, or it can be some form 
of neutral radiation, either a photon or neutron. In the case of neutral radiations, 
energy must be conveyed from the incident quantum to secondary charged particles. 
These charged particles then serve as an intermediary, depositing the energy in 
the scintillator structure via ionization. The rate of ionization energy loss per unit 
pathlength goes as Z?/v?, where Z is the atomic number, and v is the particle 
velocity. For electrons and protons of the same kinetic energy, the proton energy 
loss rate will therefore be significantly less than that of the electron. This fact can 
have important implications for scintillator applications, as we will see later on. 


4. Photon Interactions 


For X-ray and y-ray spectroscopy, the incident photon energy can be deposited by 
one of three interaction processes. The dominant interaction process depends both 
on photon energy and on the atomic number (Z) of the material, as shown in Fig. 1. 

Photoelectric Absorption. At low energies and/or high atomic number, incident 
photons will most likely interact by the photoelectric process. In this case, the 
incident photon transfers all of its energy to a bound atomic electron, some of 
which is taken up by the binding energy of the electron. The kinetic energy (E.) of 
the liberated electron (or photoelectron) is then given by 


E. = hv — Ep, (1) 


where hy represents the energy of the incident photon and Fy is the binding energy 
of the atomic electron. For a given photon energy, the probability of photoelectric 
absorption goes as Z*. 

Compton Scattering. At intermediate energies (up to ~ 10 MeV), Compton 
scattering is the dominant interaction mechanism. The scattering of a photon off 
an atomic electron results in the transfer of some fraction of the photon’s energy to 
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Fig. 1. Dominant photon interaction types in a parameter space defined by photon energy and 
atomic number. 
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the electron. The energy of the scattered photon (hv’) is given by 


; hv 
il 1+ (hv/moec?)(1 — cos 6)’ (2) 
where hy is the energy of the incident photon, m, is the electron mass, and 6 is 
the Compton scatter angle. The maximum energy is transferred to the electron at 
a scattering angle of 6 ~ 180°. 
Pair Production. At energies above 1.022 MeV the incident photon can undergo 
pair production in the Coulomb field of a nucleus. This produces two charged par- 
ticles, an electron and a positron. Energy is conserved by 


E.- + Et = hv —2m,c’?. (3) 


The positron, with a relatively small mean free path, is then likely to undergo 
annihilation, producing two 511 keV photons. One or both of the resultant 511 keV 
photons may escape the detector or they may, in turn, each be absorbed in the 
scintillator. 


5. Optical Emission 


The optical emission (fluorescence) that results from the scintillation process can 
be described in terms of its intensity, spectral distribution, and pulse shape. 


5.1. Intensity 


Two terms are often used to describe the intensity of the optical emission — light 
yield and light output. The light yield, as used here, describes the intrinsic light 
output of the scintillator. It can be measured in terms of the number of optical 
photons generated per MeV of deposited energy. The light output, as used here, 
refers to the signal strength generated by the scintillator, in conjunction with the 
readout sensor. The light output incorporates the spectral response of the readout 
sensor and is therefore very dependent on the precise experimental arrangement. 
A scintillator material that exhibits the highest light yield may not necessarily 
result in the highest light output, especially if there is a poor spectral match with 
the readout sensor. 

In general, the scintillator light yield is proportional to the deposited energy. 
Only a fraction of the deposited energy is converted into scintillation light. This 
fraction is referred as the scintillator efficiency and can depend on particle type 
and energy. Ideally, the relationship between light yield and deposited energy is 
linear, but this is only the case when the scintillator efficiency is independent of 
energy. The total light yield of a scintillator can be characterized by the number of 
optical photons generated per unit energy loss. Sometimes it is expressed in terms 
of the light yield of a comparison scintillator. 
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5.2. Spectral Distribution 


The fluorescence spectrum generated by the scintillation process results in emission 
which generally lies in the UV/optical range. In order to maximize the transmission 
of the generated light within the scintillator itself, the emission spectrum must be 
distinct from the absorption spectrum. The spectral distribution of the emitted 
light must also be well-matched to the response of the light sensor (typically a 
photomultiplier tube) used to convert the light into an electrical signal (Fig. 2). In 
some cases, the scintillator can be formulated to produce an emission spectrum in a 
desired energy range. Wavelength shifting materials can also be used to collect the 
light from a scintillator and convert it to a wavelength range that is better matched 
to the chosen light sensor technology. 


5.3. Pulse Shape 


The physical characteristics of the scintillator determine the exact process by which 
the deposited energy is converted into a light signal. The process can be generalized 
as a two step sequence, in which the radiative decay states are first populated 
as a result of the energy lost by the incident radiation. Once the radiative states 
are populated, they will decay to produce the optical light that is associated with 
the fluorescence process. Mathematically, the pulse shape (Fig. 3) can be repre- 
sented by 


[= I,(e/8 _ eum), (4) 


where 7; is the risetime and Ty is the decay time. 
Most scintillators have rise times on the order of a few ns or less. The timescale 
for the decay varies dramatically depending on the type of scintillator. Organic 
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Fig. 2. Spectral distributions for the optical emission from various scintillators and the spectral 
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Fig. 3. Typical time profile for the scintillation light. 
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Fig. 4. In many cases, there are two components to the scintillation light that exhibit different 
decay constants. 


scintillators have decay times on the order of nanoseconds, whereas inorganic scin- 
tillators have a wide range of decay times ranging from 10s of nanoseconds up 
to 100s of nanoseconds. In some cases, there are two components to the decay, 
a fast (or prompt) component and a slow (or delayed) component (Fig. 4). The 
relative importance of these two components can depend on the type of ionizing 
particle. In some cases, especially in organic scintillators, different particle types 
can be distinguished by pulse shape. This pulse shape discrimination technique can 
be especially useful for distinguishing photon interactions (which lose energy via 
electron ionization) and neutron interactions (which typically lose energy via np 


scattering).14 


6. Considerations for Spaceflight 


For spaceflight applications, consideration must be given to several important con- 
straints, including mass, volume, and power. Ruggedness is also an important con- 
sideration. Detectors must be mechanically strong enough to withstand the rigors 
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of launch. Once on orbit, the radiation environment also presents a challenge. Some 
materials, when exposed to radiation over a long period of time, can be susceptible 
to radiation damage that may adversely affect detector performance. 


7. Scintillator Types 


Scintillators are generally classified as either organic or inorganic. The two types are 
distinguished not only by their composition, but also by the physical processes that 
lead to the scintillation light. They can also be distinguished by their performance 
characteristics, some of which serve to determine the optimum choice of scintillator 
for a given application. For example, organic scintillators tend to exhibit very fast 
light pulses; hence they are often preferred for high count-rate situations, where 
dead-time and pileup may be important. On the other hand, inorganic scintillators 
tend to be made of higher Z material, thus providing much better stopping power 
for the incident radiation. 


7.1. Organic Scintillators 


Organic scintillators!’ are composed of aromatic hydrocarbons. The low Z of hydro- 
gen and carbon, coupled with a low density (typically, about 1 g cm~*), means that 
organic scintillators tend to be relatively inefficient absorbers of y-ray photons. 
When photons do interact they tend to do so by Compton scattering. On the other 
hand, charged particles will lose energy continuously via ionization. Organic scin- 
tillators are very efficient at detecting minimum ionizing particles. 

The fluorescence process in organic scintillators is intrinsic to individual 
molecules. It involves transitions of electrons within the energy structure of each 
molecule. Since each molecule acts as a scintillation center, the scintillation does 
not depend on the state of matter. Organic scintillators can therefore be found as 
crystals, solids, liquids, or gases. Most astrophysical applications use crystals, solids, 
or liquids. Gases are rarely, if ever, used. 

Organic scintillators are very fast, with risetimes of less than 1 ns and decay 
times typically a few ns or less. This makes them ideal for high counting rate 
situations or applications where fast timing is required. The scintillation emission 
of organic scintillators peaks in the range 400-450 nm. 

For electrons and protons of the same energy, the particle velocity will always 
be significantly less for the proton. Since the ionization energy loss rate goes as 
z7/v?, the rate of proton energy loss (dE/dx) will be larger, which translates into a 
higher ionization density along the particle track. The higher ionization density leads 
to greater quenching (lower scintillation efficiency). As a result the light output for 
protons will be significantly less than for electrons of the same energy. Because of this 
dependence on particle type, the light output is often expressed as “MeV electron 
equivalent” (MeV,.). A light output of 1 MeV... corresponds to a 1 MeV electron. 
As can be seen in Fig. 5, the same light output will require >1 MeV energy deposit 
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Fig. 5. Light yield versus energy for various particles in BC-400 scintillator. (Adapted from 
Organic Scintillation Materials, Saint-Gobain Crystals, 2015.) 


Table 1. Properties of representative organic scintillators. 


Light Decay Peak 


Yield Time Emission Density 
Scintillator Type (relative) (ns) (nm) (g cm~?) 
Anthracene Crystal 100 30 445 1.25 
Stilbene Crystal 50 4.5 410 1.16 
Pilot F/BC-408/EJ-200 Plastic 64 2.1 425 1.032 
NE-110/BC-412/EJ-208 Plastic 60 3.3 434 1.032 
NE-111A/BC-422/EJ-232 Plastic 55 1.4 370 1.032 
NE-213/BC-501A/EJ-301 Liquid 78 3.2 425 0.874 


NE refers to Nuclear Enterprises, an early producer of organic scintillator; the nomen- 
clature is often still used for historical purposes. BC refers to Bicron, a brand that is 
now marketed by Saint Gobain Crystals. EJ represents products that are marketed by 
Eljen Technology. 


from a proton and even higher energy deposits from heavier particles. Figure 5 also 
shows that the light output does not always vary linearly with energy and that 
the linearity is different for electrons than it is for protons (or other heavy charged 
particles). 

Light output of organic scintillators is independent of temperature over a wide 
temperature range from —60° to +20°C. It increases slightly (about 5%) from +20° 
C to +60°C. 

There are three types of organic scintillator: crystals, liquids, and plastics. A 
variety of formulations for liquid and plastic scintillators have been produced. They 
vary in terms of emission spectrum, decay constant, attenuation of the optical emis- 
sion, and pulse shape discrimination capability. Some properties of representative 
organic scintillator types are shown in Table 1. 
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7.1.1. Organic Crystal Scintillators 


Only two organic crystals are used as scintillator: stilbene and anthracene. 
Anthracene has the highest light yield of all organic scintillators. Both are rela- 
tively fragile, making them less than ideal for astrophysical applications. They are 
also difficult to produce in large sizes. Perhaps the most significant disadvantage, 
however, is that the light output depends on the orientation of the ionizing particle 
track with respect to the crystal axis. Directional variations in the light yield of up 
to 20-30% have been observed. Nonetheless, there may be situations in which these 
materials may be useful, especially in those cases where pulse shape discrimination 
is desired. 


7.1.2. Organic Liquid Scintillators 


Liquid scintillators are produced by dissolving an organic scintillator in some appro- 
priate solvent. They may be especially useful in applications where large volumes are 
required. Dissolved oxygen can reduce the scintillation efficiency (via quenching), 
so care must be taken in handling. The lack of a solid structure means that liquid 
scintillators are relatively immune from radiation damage. They generally have a 
lower light output than organic crystals, but a higher light output than plastics and 


are useful for applications which require pulse shape discrimination.'4 


7.1.3. Organic Plastic Scintillators 


A plastic scintillator consists of a solid solution of organic scintillating molecules in a 
polymerized solvent. The ease with which they can be shaped and fabricated makes 
plastic scintillators an extremely useful form of organic scintillator. A large num- 
ber of different plastic scintillators are available, having a wide variety of emission 
spectra and decay times. These properties are determined by the selection of both 
the activator and host material. Plastic scintillators are characterized by a light 
output that is about 25-30% of Nal(Tl) and attenuation lengths that can be as 
long as several meters. Plastic scintillators are known to be susceptible to radiation 
damage,!® but some formulations have shown to be less so.!” 

Plastic extrusion techniques have been used to produce scintillating fibers with 
diameters as small as 250 xm. These have been used, for example, in designs that 
require precise positioning of charged particle tracks, such as in a pair production 


telescope where precise tracking of the electron and positron is required.!®* 19 


7.2. Inorganic Scintillators 


Whereas most organic scintillators have a mass density similar to water 
(~1 ¢ cm~3), inorganic scintillators typically have much higher mass densities, some 
as high as 7 or 8 g cm~?. This, coupled with the higher atomic number of inorganic 
materials, results in significantly greater probability for photon absorption. 
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Unlike the fluorescence process in organic scintillators, in which individual 
molecules are responsible for the optical emission, the fluorescence process in inor- 
ganic scintillators relies on the bulk structure of the crystalline material. Impurities 
(referred to as activators) are often added to the crystal during the growth process 
to create energy states that lie within the band gap of the bulk crystal. Commonly 
used activators include thallium and cesium. Excited electrons de-excite through 
the energy levels of the activator. The decays within the activator site lead to the 
generation of optical emission, so the characteristics of the optical emission are 
dictated by the activator. The optical emission is generally not absorbed by the 
scintillator because the corresponding energy is not sufficient to excite an electron 
across the band gap. 

The light yield depends on the number of electron-hole pairs produced within 
the crystal structure. The energy required to create one electron-hole pair is typ- 
ically about three times the band-gap energy (which amounts to about 20 eV in 
sodium iodide). Quenching effects (radiationless energy transitions from the excited 
state) act to reduce the light output. Since these quenching effects can be temper- 
ature dependent, the light output often varies with temperature (Fig. 6). 

In some inorganic scintillators, intrinsic radioactivity can be an issue. For some 
high-Z elements, naturally occurring radioactive isotopes exhibit emissions that 
can fall within the energy range of interest. For astrophysical applications, where 
the signal-to-noise is already quite low, this can exacerbate an already challeng- 
ing background problem. Examples of intrinsic background sources include !76Lu 
in Lug(Si04)O (also known as LSO) and !38La in Lanthanum Halide scintillators 
(LaBr3 and LaCls). Purification techniques can sometimes be employed to reduce 
the level of these troublesome isotopes. 

Many inorganic scintillators are hygroscopic. They have a tendency to absorb 
moisture from air. This property makes fabrication and handling a delicate task. In 
most cases, the scintillator is hermetically sealed by the manufacturer. The scintil- 
lator housing must be designed to minimize photon absorption, especially for low 
energy applications. It must be thin, but mechanically rigid. Any special detector 
geometries must often be provided by the manufacturer, with a consequent cost 
increase. 

Inorganic scintillators are susceptible to radiation damage.?? The damage 
appears to be dependent not only on total dosage, but also on both rate and type of 
radiation. Radiation damage can affect the transparency of the material to scintilla- 
tion light and can also affect the scintillation process itself. Thallium-activated alkali 
halides appear to be especially vulnerable. Activation of the component materials 
can also be an issue, particularly on orbit where the proton cosmic-ray flux can 
be high.??+?2 

A large number of inorganic scintillators have been developed.?* The continued 
development of new and improved scintillator materials is a very active research 
area.2+ Table 2 shows the properties of a wide range of scintillator materials. 
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Table 2. Properties of representative inorganic scintillators. 


Light Decay Peak 

Yield Time Emission Refractive Density 
Scintillator  (ph/MeV) (nsec) (nm) Index (g cm~?) | Hygroscopic? 
LaBr3(Ce) 63, 000 16 380 1.9 5.08 yes 
Nal(T1) 38, 000 250 415 1.85 3.67 yes 
LaCl3(Ce) 49, 000 28 350 1.9 3.85 yes 
CsI(Na) 41, 000 630 420 1.84 4.51 yes 
CsI(TI) 54, 000 1000 550 1.79 4.51 slightly 
BGO 9000 300 480 2.15 7.13 no 
CdWO4 15, 000 14, 500 470 2.3 7.90 no 
LYSO 32, 000 41 420 1.81 7.1 no 
YAG 8, 000 70 550 1.82 4.55 no 
CaF (Eu) 19, 000 940 435 1.47 3.18 no 
GAGG(Ce) 57, 000 258 520 6.63 no 
SrIa(Eu) 90, 000 1000 435 4.55 yes 


In what follows we review the basic properties of the materials most commonly 
used for astrophysics applications. 


7.2.1. Sodium Iodide (Nal) 


Nal(T1) was first demonstrated as a scintillator in 1948.2° Amazingly, it remains 
one of the most widely-used scintillation materials. It has one of the highest light 
outputs of any inorganic scintillator, producing about 38,000 optical photons per 
MeV of deposited energy. The typical energy resolution at 662 keV is in the range 
of 6.5-7.0%. It can be grown in very large sizes and can be machined to a variety of 
shapes. However, like many inorganic scintillators, it is very hygroscopic. It must be 
hermetically sealed to maintain its integrity. The dominant decay time is 230 ns, but 
longer decay components have also been measured. It is therefore not well suited 
to use for high counting rate situations, such as might be expected for solar flares 
or y-ray bursts. The light output and decay time are both temperature dependent. 
The light yield peaks near 30°C, as can be seen in Fig. 6. At lower temperatures the 
decay constant increases significantly (Fig. 7). Operating temperatures near room 
temperature are therefore preferred for optimum performance. 


7.2.2. Cesium Iodide (CsI) 


CsI has a higher density than Nal, so it is characterized by a somewhat larger 
absorption coefficient per unit size. It is also more rugged (less brittle) than Nal. The 
typical energy resolution at 662 keV is in the range of 6.0—7.0%. It is produced in two 
varieties, both thallium-activated CsI(T1) and sodium-activated CsI(Na). The two 
scintillators have distinctly different properties. CsI(T1) is better for discriminating 
between different particles using pulse shape.!° CsI(T1) is also less hygroscopic than 
CsI(Na) and Nal(T1). With care, it can be used for applications where customized 
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Fig. 6. Temperature dependence of the output of various inorganic scintillators. (Adapted from 
Photomultiplier Tubes and Assemblies for Scintillation Counting and High Energy Physics, Hama- 
matsu Photonics, 2016.) 
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Fig. 7. Primary Nal(Tl) decay constant vs. temperature. Data are taken from Ref. 26. 


(in-house) detector fabrication is preferred. CsI(T1) has a longer wavelength emission 
that is a poor match to most PMT photocathodes, but is well-matched to sensitivity 
of photodiode detectors. CsI(T1) has been grown in a micro columnar structure,?” 78 
which permits thin layers of scintillators that are especially useful for measuring the 
location (in two dimensions) of the photon interaction site(s). 


7.2.3. Bismuth Germanate 


Bismuth Germanate (Bi,Ge3012 or BGO)?9 is a high density, large Z scintilla- 
tor. It has the largest y-ray efficiency per unit volume for any commonly available 
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scintillator. The typical energy resolution at 662 keV is ~ 10%. It is a rugged, 
nonhygroscopic material, which means that it can easily be machined to various 
shapes and sizes. Unfortunately, it has a relatively low light yield, only 10-20% 
that of NaI(Tl). With a peak emission near 480 nm, it is also poorly matched to the 
spectral response of traditional bialkali PMTs, further limiting its performance. The 
light yield is very temperature dependent, falling off very rapidly with temperature 
(Fig. 6). 


7.2.4. Lanthanum Halides 


One of the most exciting developments in recent years has been the commer- 
cialization of Ce-activated Lanthanum Chloride (LaCl3) and Lanthanum Bromide 
(LaBr3).°° 32 LaBrs is marketed (by Saint-Gobain Crystals) as BrilLanCe 380. 
LaCl3 is marketed as BrilLanCe 350. (The numbers in each case refer to the peak 
wavelength in their emission spectrum.) In addition to their high density and their 
high Z, these materials offer many characteristics that are superior to other inor- 
ganic scintillators. Paramount among these is the light yield. LaBrg has a light 
yield that is almost 70% greater than that of Nal(Tl). This, coupled with a good 
spectral match for typical readout devices (including PMTs), results in energy res- 
olution measurements (<3% at 662 keV) that are not only far better than any 
other inorganic scintillator,*? they are comparable to results obtained with some 
room-temperature semiconductor materials (such as CdZnTe).°4 Another signifi- 
cant advantage is the very fast decay times. With decay times under 30 ns, they are 
roughly ten times faster than Nal(T1), almost comparable to organic scintillators. 
This makes them ideal for high counting rate situations. For space applications, 
these materials also offer an energy resolution that is relatively constant over a 
wide range of temperatures and good resistance to radiation damage.?*: °° 

Although these materials exhibit many superior performance parameters, their 
use has been limited. These materials suffer from intrinsic radioactivities, primarily 
from }38La and ?’7Ac, which limits their usefulness in low count rate situations.” 
Purification techniques have been successfully employed to reduce the impact of 
some of these isotopes. The development of growth techniques has also been hin- 
dered by an anisotropic thermal expansion that creates a potential for cracking 
during the after-growth cooling process. The final product is also very hygroscopic. 
Consequently, the cost of these materials remains relatively high. In recent years, 
there has been some effort to fabricate detectors using nanoparticle composites, as 
a means of reducing the fabrication costs.3% 3° 


8. Light Collection 


Once the optical light is generated, the technical challenge is to collect as much of 
that light as possible in the photosensor. The efficiency of light collection impacts 
the energy resolution (more photons, smaller error) and the energy threshold (the 
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smallest signal that can be measured). The uniformity of the light collection effi- 
ciency, which can be influenced by the scintillator size and geometry, must also be 
considered. Non-uniform light collection can impact the energy resolution and will 
lead to a dependence of the threshold on interaction location. 

To maximize light collection, two important effects must be considered. First, 
self-absorption of light within the scintillator reduces the amount of light that can 
be collected. Although this is generally not an issue for many scintillators, it can 
be important for large volume scintillators. The second, and more important, effect 
is light loss at the scintillator surfaces. Light should be allowed to escape the scin- 
tillator only at the surface that leads to the light sensor. At all other surfaces, the 
light should be contained. 

The optical light is emitted isotropically from the interaction site. When light 
reaches the scintillator surface, total internal reflection takes place for incidence 
angles greater than the critical angle (6 > 6., as measured with respect to the 
surface normal). At smaller angles, an increasing fraction of the light escapes. At 
0°, most of the light escapes. 

The critical angle depends on the index of refraction of the two adjoining media. 
For two media with the same index of refraction, the critical angle is 90° and there 
is no angle for which there is total internal reflection. For light that travels from 
a medium with a smaller index of refraction to a medium with a larger index of 
refraction, none of the light is totally internally reflected. All light is transmitted. For 
light that travels from a medium with a larger index of refraction to a medium with 
a smaller index of refraction, a significant fraction of the light is totally internally 
reflected. 

Scintillators are most often coupled to either a glass window or a glass PMT. 
For efficient transmission of the optical light out of the scintillator, this means 
matching the index of refraction of the scintillator to the glass of the PMT (about 
1.5). Organic scintillators tend to be a good match to glass (with indices of refraction 
of 1.58-1.62). Inorganic crystals tend to have much higher indices (from 1.6 to 2.4), 
so they tend to be a poor match for PMTs. Inorganic scintillator assemblies must 
deal with the effects of total internal reflection. 

For light that does escape the non-optical surfaces, much of it can be reflected 
back into the scintillator using a reflective wrapping material.4° The wrapping mate- 
rial must be carefully chosen. For example, it should effectively reflect light over a 
broad range of wavelength. In addition, it has long been assumed that diffuse reflec- 
tors (which reflect light over a broad range of directions) are preferred, but in recent 
years, the best results have been demonstrated using specular reflectors (in which 
the angle of reflection is equal to the angle of incidence). Several different wrapping 
materials are now in common use: 


(1) Magnesium Oxide (MgO) powder has long been used to pack the scintillator.* 
(2) One of the best and most easily available materials for crystal wrapping is 
the white polytetrafluoroethylene (PTFE) tape used for household plumbing. 
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(Although commonly referred to as teflon tape, it is not actually made of 
teflon.) 

(3) Another household product that serves as an effective wrapping material is 
Tyvek®) by DuPont. Made of high density polyethylene fiber, it is commonly 
used as a protective house wrap because it provides an effective water barrier 
between the exterior and the inside framing. It turns out to have very good 
optical reflective properties as well. 

(4) In recent years, the use of Vikuiti Enhanced Specular Reflector (ESR), film a 3M 
product, has become widespread. Originally developed to enhance the visibility 
of LCD displays, its reflectance properties are ideally suited for scintillator 
wrapping. 


Light pipes can be used to conduct light to a readout sensor in those cases where 
the sensor cannot be directly coupled to the scintillator, either because of space 
limitations or because of geometric limitations (e.g. coupling a circular sensor to the 
edge of a flat scintillator). They are typically made of lucite, which can be easily 
formed into a variety of shapes. So-called adiabatic light pipes can theoretically 
transmit all of the light. Optical coupling to the PMT can be achieved by using 
optical grease or a special optically transparent glue. 


9. Readout Sensor 


The final conversion of optical light to an electrical signal is accomplished through 
the use of a photosensor. Numerous photosensor technologies have been developed. 
Here we discuss only some of the most important examples. 


9.1. Photomultiplier Tube (PMT) 


The traditional photosensor for scintillation detectors has been the photomultiplier 
tube (PMT). A PMT works by first converting incident optical photons into elec- 
trons and then amplifying the resultant charge pulse (Fig. 8). The conversion process 
takes place at a photosensitive layer known as the photocathode. The photocathode 
material is on the inside of a glass vacuum tube. Electrons are liberated through 
photoelectric absorption and guided to a electron multiplier structure using applied 
electric fields. The electron multiplier structure consists of a series of metallic dyn- 
odes. As electrons strike each dynode, additional electrons are liberated through 
a process known as secondary electron emission. Electron amplification factors of 
10° or more can be achieved. The quantum efficiency (as a function of wavelength) 
can be adjusted by changing the photocathode material. Several materials are avail- 
able. Most commonly used is the bialkali photocathode, which has a peak quantum 
efficiency of ~ 25%, although variants of this coating (the so-called “ultra bialkali 
photocathode” ) now give peak quantum efficiencies as high as ~ 45%. 

Several useful variants of a traditional PMT have also been developed. These 
include a hybrid photodetector (HPD), which combines a photocathode with an 
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Fig. 8. Components of a PMT, shown coupled to a scintillator. (“PhotoMultiplierTubeAndScin- 
tillator” by Qwerty1l23uiop. Licensed under CC BY-SA 3.0 via Wikimedia Commons. https:// 
commons. wikimedia.org/wiki/File:PhotoMultiplierTubeAndScintillator.jpg See electronic edition 
for a color version of this figure.) 


avalanche photodiode (see below) for electron collection; and a multi-anode PMT 
(MAPMT), which uses a dynode structure that leads to several distinct anodes. 


9.2. Solid-state Photosensors 


Solid-state photosensors (photodiodes) offer several potential advantages over the 
traditional PMT. They are much more compact, very rugged (no vacuum tube), 
and have a much lower power requirement. Operating voltages are significantly lower 
than that of a PMT (25-100 V versus 1000-1500 V for a PMT). Additionally, unlike 
PMTs, they are insensitive to magnetic fields. These are all important advantages 
for any astrophysical application. It is therefore not surprising that a lot of attention 
has been paid to these technologies in recent years. 


9.2.1. Conventional Photodiode 


The quantum efficiency (QE) of silicon photodiodes**’** is typically in the range 


of 60-80%, much higher than the QE of a typical PMT (~ 20%). In addition, the 
sensitive wavelength range is much broader than a PMT (Fig. 9). The net result is 
that there is a much higher primary charge created by the photodiode than a PMT, 
but a conventional photodiode provides no amplification of the primary charge 
signal. These devices, with their small signals, are therefore more susceptible to 
noise issues. Cooling is often employed to reduce the noise. 


9.2.2. Avalanche Photodiode (APD ) 


Avalanche photodiodes (APDS)**4° are operated at higher applied voltages 
than traditional photodiodes. This accelerates the charge carriers to sufficiently 
high energies that additional electron-hole pairs are created, resulting in an 
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Fig. 9. The quantum efficiency of solid-state photosensors (conventional photodiodes and 
avalanche photodiodes) is significantly higher and covers a much broader wavelength range than 
that of a typical PMT. (Figure adapted from www.hamamatsu.com, with permission from Hama- 
matsu Photonics.) 


amplification of the signal. The applied voltage is just below the breakdown voltage, 
in a range where the gain is very sensitive to small changes in the applied voltage. 
Consequently, these devices are very sensitive to small changes in both voltage and 
temperature. For situations where temperature may vary, active gain stabilization 
techniques (which adjust the voltage based on measured gain) are required. 


9.2.3. Silicon Photomultiplier 


At even higher applied voltages, a “runaway” avalanche takes place. This is known 
as the regime for Geiger mode operation. The collected charge is no longer propor- 
tional to incident energy. Even a single photon will generate an avalanche. A silicon 
photomultiplier (SiPM, also known as a multi-pixel photon counter, or MPPC) 
consists of a large array of very small APDs operated in Geiger mode.*® 47 Each 
APD element (as small as 10j:m across) is small enough that only a single photon is 
likely to be incident. Since the output of each cell is identical, summing the signals 
from the array provides an output that is proportional to the number of triggered 
cells, which, in turn, is proportional to the incident light signal. Arrays can consist 
of as many as 10* cells. 


10. Detector Response Function 


Interpreting the data from a scintillation detector requires an understanding of 
the difference between the incident photon spectrum and the measured energy loss 
spectrum. The output of the detector provides a pulse-height spectrum, which is a 
measure of the energy loss spectrum. It is related to the energy that is absorbed in 
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the scintillator and converted to optical emission. The real goal of any measurement, 
however, is to determine the incident photon spectrum, which can only be accom- 
plished by properly unfolding (deconvolving) the measured energy loss spectrum. 
Mathematically, this process can be represented by 


M(E)= / R(E, B')S(E')dE', (5) 


where S represents the incident source photon spectrum, M represents the measured 
energy loss spectrum, and R represents the response function, which describes how 
the detector responds to incident photons. The details of the response function 
depend on several parameters, including the incident photon energy, the scintillator 
composition, and the scintillator geometry (since larger volumes are more likely to 
contain the full photon energy, even after several interactions). 

We can also distinguish between the intrinsic energy loss spectrum, which repre- 
sents the true energy absorbed by the scintillator and converted into optical photon 
emission, and the measured energy loss spectrum, which is the energy loss spectrum 
measured by the detector. One of the most important differences between these 
two is that the measured spectrum incorporates the effects of the detector energy 
resolution. 


10.1. Intrinsic Energy Loss Distribution 


In the regime where photoelectric absorption dominates, the full energy of the pho- 
ton is absorbed in a single photoelectric interaction. As the photon energy increases, 
Compton scattering plays an increasingly important role. The photon is then likely 
to scatter at least once (and perhaps several times) before it loses enough energy 
to be fully lost by a photoelectric absorption. In either case, the full energy of the 
photon is absorbed. The intrinsic energy loss distribution then looks like Fig. 10. 
This peak is often referred to as the “photopeak” (as in “photoelectric peak”) or 
“full-energy peak”. The photofraction is a parameter that measures the total frac- 
tion of counts that fall within the full-energy peak. It is generally desirable to have 
a photo fraction as close as possible to a value of 1. 

Whenever multiple interactions are required to absorb the full photon energy, 
it is likely that some of that energy will escape from the detector. If a photon 
undergoes Compton scattering, the scattered photon may exit the detector without 
being fully absorbed. Incomplete absorption can also result from interactions that 
take place close enough to the edge of the detector for the Compton scattered 
electron to escape. Often, only a single scatter takes place, followed by the escape 
of the photon. The resulting energy loss distribution then looks something like 
Fig. 11. 

Likewise, if the incident photon undergoes pair production, then some fraction 
of the total energy may also escape the detector, in the form of one or two 511 keV 
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Fig. 10. Intrinsic energy loss distribution for photoelectric absorptions. 
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Fig. 11. Intrinsic energy loss distribution for Compton interactions. 


photons that result from the annihilation of the positron created in the pair pro- 
duction process. The escaping annihilation photons can lead to both a first escape 
peak (where only one annihilation photon escapes) or a second escape peak (where 
both annihilation photons escape). These peaks will have energies that are 511 keV 
and 1022 keV, respectively, below that of the full-energy peak. 


10.2. Measured Energy Loss Spectrum 


The measured energy loss spectrum can be thought of as a sum of all of the intrinsic 
energy loss components (photoelectric, Compton, pair production) that is broadened 
by the finite energy resolution of the detector. The relative contribution of the three 
energy loss components will vary with both energy and the Z of the scintillator 
material. Organic scintillators, for example, are almost always dominated by the 
Compton component. Inorganic scintillators typically exhibit all three components. 
An energy loss spectrum typical of an inorganic scintillator is shown in Fig. 12. 
The energy loss spectrum will also depend on the size (volume) of the scintillator. 
Smaller volumes are more susceptible to partial energy loss events. 
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Fig. 12. Various components of the measured energy loss spectrum in the energy regime where 
pair production does not play an important role. This spectrum is more typical of that for an 
inorganic scintillator. 


The energy loss spectrum will evolve as energy increases. At low energies 
the spectrum is dominated by the full-energy peak (photofraction ~ 1). As the 
energy increases, the relative contribution of the Compton component will increase 
(photofraction decreases). At the highest energies partial energy loss events will 
dominate and there may be no well-defined spectral features. 


10.3. Energy Resolution 


The energy resolution is based on a measure of the full-energy peak in the spectrum. 
It is defined to be the width of the full-energy peak (the full width at half maximum, 
or FWHM, measured in energy units) divided by the energy of the peak. It is 
normally expressed as a percentage. 

The energy resolution depends on a number of factors, including the scintil- 
lator light yield, the light collection efficiency (which depends on such factors as 
transparency of the scintillator and the properties of the scintillator housing), the 
variation of light collection efficiency within the scintillator volume, and the quan- 
tum efficiency of the readout sensor. It also depends on the scintillator light output 
as a function of the energy deposit. If the response is not linear with energy, the 
total energy resolution will depend on the number of interactions and the energy 
liberated in each case. 

As a concrete example, consider the case of a 662 keV y-ray interacting in 
a Nal(Tl) crystal that is being read out by a PMT. The light yield in this case 
will be about 25,000 photons. We can estimate that about 15,000 of those will 
reach the photocathode. The typical quantum efficiency of a photocathode is about 
20%, so about 3000 photoelectrons will be liberated. It is the statistical variance 
in the number of photoelectrons that determines the theoretical limit of the energy 
resolution, 
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Fig. 13. The energy resolution is dependent primarily on the light yield of the scintillator. In 
practice, the theoretical limit is never reached, but some materials get closer to that limit than 
others. (Reprinted from Ref. 48 with permission from Elsevier.) 


where N is the total number of photoelectrons liberated at the photocathode. In 
our example, we get 

AE 1 

E > J3000 = 0.018. (7) 
This corresponds to a lo width of 1.8% or a FWHM of 4.3%. In practice, a typical 
Nal(T1) detector will have a resolution of about 7% at 662 keV. Several issues 
will limit the achievable energy resolution, including the non-uniformity of the 
scintillation process (spatial variations within the crystal), the non-uniformity of 
photoelectron collection from the PMT photocathode, and the non-proportionality 
of the light yield as a function of energy deposit.‘' Figure 13 shows data from 
various scintillators as compared to the theoretical limit. 


11. Special Configurations 


Scintillation detectors can be used for a wide variety of different applications, not 
just for measuring the incident photon spectrum. It is worth mentioning here some 
ways in which scintillation detectors are often employed in astrophysical instrument 
designs. 


11.1. Spatial Imaging 


The location of the photon interaction site can be important for many imaging 
applications, such as Compton telescopes* or coded aperture imaging instruments. 


*See Chapter 3 of this volume. 
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By employing appropriate detector and readout geometries it is possible to isolate 
the interaction site with very fine (mm or even sub-mm) spatial resolution. One 
approach is to use arrays of small scintillator elements read out by individual light 
sensors or light sensor arrays (such as a multi-anode PMT). In this case, the (2D) 
spatial resolution is determined by the size of the scintillator elements. Finer spatial 
resolution can be achieved by using multiple readout sensors on a single contiguous 
scintillator. The relative signal in each sensor can be used to isolate the interaction 
location, in a technique known as Anger camera imaging.*® The spatial resolution 
is determined by the size and spacing of the readout sensors and the thickness 
of the scintillator. Location determination in a third dimension, determining the 
so-called depth of interaction (Dol), can also be made by placing readout sensors 
on both the top and bottom of the scintillator and using the relative signal strength 
to determine depth. 


11.2. Shielding 


Scintillation detectors are also often employed as an anti-coincidence detection sys- 
tem, enclosing the primary photon detector, but used to reject incident charged 
particle events. Thin sheets of plastic scintillator make for good anti-coincidence 
detectors, in that they are relatively insensitive to photons, but are very efficient 
at detecting charged particles. Large volume inorganic scintillators have also been 
used as shielding.®:© They can be used both as charged particle shields and Compton 
shields, which are designed to absorb Compton scattered photons that escape from 
the primary detector. 


11.3. Calorimeters 


Large volume inorganic scintillation detectors have been used on several pair pro- 
duction y-ray telescopes as calorimeters to ensure the full-energy absorption of the 
electron-positron pair.**’ The large volume and high-Z characteristics make them 
effective at absorbing the energy of the incident radiation. 


12. Scintillation Detectors vs. Solid-State Detectors 


Scintillation detectors are often compared to semiconductor detectors, such Ge or 
CdZnTe. Semiconductor (or solid state) detectors are characterized by their finer 
energy resolution, which is often considered to be a primary consideration. Although 
some scintillator detectors (such as LaBr3) can now achieve energy resolutions that 
are comparable to room-temperature solid-state detectors (CdZnTe), Ge detectors 
still provide the finest energy resolution (typically < 1% at 662 keV). However, 
energy resolution is not always the deciding factor in the choice of detector material. 
Other issues, including cost, complexity, ease of fabrication, and reliability can also 
be important. Unlike some solid-state detectors (such as Ge), scintillation detectors 
do not have any special cooling requirements. They can also provide much faster 
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signals, which can be especially important in any high count rate situations, such 
as solar flares or y-ray bursts. Scintillators are routinely fabricated in large areas 
and large volumes, providing improved efficiency for low photon flux levels. When 
one considers continuum or broad-line emissions, scintillators tend to provide better 


sensitivities than high energy resolution solid-state detectors.°° 


13. Summary 


The development of scintillators and new scintillator readout technologies is a very 
robust area of development. Despite the fact that the basic technology is over 60 
years old, scintillation detectors remain an effective and very useful technology for 
exploring the high energy universe. Their ease of fabrication in a variety of shapes 
and sizes, their relatively low cost, their ease of use, and their reliability have made 
them the de facto standard for many experiments. Continued developments will 
likely ensure that they remain so for many years to come. 
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Dedicated to the late Prof. R.S. White of the University of California, River- 
side, who along with colleagues at the Max-Planck Institut fiir Extraterrestrische 
Physik, pioneered the development and use of Compton telescopes. 


We describe the physics of Compton Telescopes and their applications for imaging 
and spectroscopy in the MeV y-ray range. We also discuss the unusual strengths 
and limitations of the concept. Compton telescopes take a variety of forms. One 
(COMPTEL) operated in the Low Earth Orbit environment, some have operated 
on balloon platforms, while others have yet to leave the laboratory. As with the 
basic concept, the different employed technologies have their own strengths and 
limitations that we discuss. 


1. Gamma-Rays in the MeV Range and Compton 
Scattering Physics 


The MeV part of the y-ray spectrum is arguably the richest, but the most prob- 
lematic energy range of experimental y-ray astronomy. We define the MeV range as 
that between 400 keV and 20 MeV — a range of a factor of 50. This range embodies 
several emission processes: electron bremsstrahlung, inverse Compton scattering, 
annihilation radiation and nuclear transitions. Similarly, the detection physics is 
not singular, i.e. photoelectric effect, pair production and Compton scattering. The 
range of potential physics in play suggests that this is a difficult energy range to 
study. However, clever instrument design can simplify the concept, but, as we see 
below, it is the interplay between the hadronic cosmic radiation field, practical 
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material choices for instrument construction and nuclear physics that produces a 
much poorer signal-to-noise ratio than that enjoyed at other energies. 

This review or primer for Compton telescopes relies heavily on the COMPTEL 
experience. It was the first Compton telescope to fly in space and, in doing so, 
to perform full-sky imaging and spectroscopy in the MeV range. It allowed the 
measurement of many cosmic sources of MeV y-ray emission, including gamma- 
ray bursts.? The reader should refer to reviews of Compton telescopes and y-ray 
imaging technology by Schénfelder** that are still relevant today. 

The Compton process, named for Arthur Holly Compton (1892-1962), is a less- 
than-ideal physical process for detecting and measuring photons. The Klein—Nishina 
cross-section® is small, and the energy transfer to the recoil electron is incomplete. 
To use Compton scattering, an instrument must necessarily be sizeable, with sub- 
stantial amounts of material to (1) initiate an interaction and (2) allow for an 
additional interaction to gather essential information for imaging and spectroscopy. 
However, in the MeV range, the Compton process dominates other processes in low- 
Z materials. Figure 1 illustrates the dependence of the interaction probability for a 
given atomic number as a function of photon energy. The center region in the figure 
represents the photon energy in which the Compton process dominates. For low 
and high photon energies, we see that the Compton scattering regime is smaller for 
heavier nuclei as the photoelectric effect and pair production, respectively, become 
more prominent. Another important feature of Compton scattering is that it only 
weakly depends on photon energy, falling by only two orders of magnitude over 
five decades of photon energy (1 keV-100 MeV). At its most fundamental level it is 
the elastic interaction between a photon and a charged particle, in particular, an 
electron, where energy and momentum are conserved. It is closely related to inverse 
Compton scattering, synchrotron radiation and bremsstrahlung, where in the latter 
two cases the photon is a virtual one.® 
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Fig. 1. Regions in photon energy space where the Compton process dominates.! Reproduced by 
permission from McGraw-Hill, New York. 
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Fig. 2. 


COMPTEL schematic with Compton scattered y-ray. Left-hand panel is from Ref. 9. 


The kinematics are given by the Compton formula: 


cos@ = 1—0.511(1/E, —1/E,), (1) 


where £, and Ey are the energies of the scattered and incident photons (expressed 
in MeV), respectively,” as illustrated in Fig. 2 for the COMPTEL instrument. The 
angle 6 in Eq. (1) and Fig. 2 is the deflection angle of the incident photon. The 
cross-section for such a scatter is of order of the Thomson cross-section, 8rr2 /3, 
where rg is the classical electron radius. The photon scatters increasingly in the 
forward direction as the photon exceeds the electron rest mass. For low-energy 
photons, the behavior approaches that of Thomson scattering, but the backscatter 
energy (9 = 7/2) asymptotically approaches 256keV (half the electron rest-mass 
energy) for a photon of infinite energy. Reference 1 provides a good description 
of the process including polarization-dependent scattering. Lastly, Eq. (1) assumes 
the target electron to be at rest. This is not the case for bound electrons, especially 
tightly bound electrons in the high-Z material, e.g. Ar or Xe, used in some telescope 
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designs. The effect of scattering off the moving electron is that the scattered photon 
(and electron) energy is Doppler shifted in a random way. If the electron is not at 
rest, the sharing of energy between the photon and the electron changes, yielding 
a different scatter angle 0 (Eq. (1)), compared to that computed for an electron at 
rest, while the change in the total measured energy is of the order of the atomic 
electron energy — very small. The effect of the Doppler broadening is a degradation 
of the angular resolution figure below 1 MeV, with a negligible effect on the final 
photon energy resolution.® (Sec. 5). 

The basic idea behind a Compton telescope is that if one can locate the posi- 
tion of the Compton scatter and determine the direction of the scattered photon by 
locating the point of a second interaction, the Compton scatter angle, @, together 
with the direction of the scattered photon, defines a set of directions in space, one 
of which is the direction of the incident photon. That set of allowable incident direc- 
tions lies on the mantle of a cone, the axis of which is the scattered photon velocity 
vector with a cone with half angle 9, as illustrated in Fig. 2. For an instrument 
like COMPTEL, the locations of the first and second scatters are required to take 
place in separated detecting planes, often referred to as D1 and D2.° An accurate 
measure of the Compton scatter angle is obtained by measuring the energy of the 
recoil Compton electron and a complete energy measure of the scattered photon 
(which also provides a measure of the incident y-ray energy). How that translates 
onto a working instrument is shown in Fig. 2 for a y-ray incident on the COMPTEL 
instrument. 

The detecting material is typically either a scintillator or solid-state detectors. 
In scintillators, the ionizing electron produces a scintillation that is measured by 
photomultiplier tubes!° or, more recently, with silicon photomultipliers.!! In solid- 
state detectors, the free charge from the ionization (and its image charge) is collected 
on the electrodes. The time from ionization to full collection of energy differs greatly 
between the two technologies, with scintillators registering an event on the timescale 
of ns, while the charge mobility in the solid-state devices takes on the order of jus for 
full collection. In gaseous detectors, the signal registration is slower still (Sec. 4.2). 


2. The Sensitivity of Compton Telescopes in Practice 


Compton telescopes rely on two successive scatters that are physically separated 
by centimeters to meters. The information from the second scatter (energy and 
location) constrains the kinematics of the first scatter. As we will see below, the 
constrained kinematics of the first scatter allows images and spectra to be con- 
structed, the images formed by the superposition of event circles (2D image space) 
or event cones (3D data space); see Fig. 2. 

Because the cross-section for Compton scattering is small and because two scat- 
ters are required for detection, the process is inherently inefficient. However, if one 
is able to reduce the background even further, then the concept is viable. Relative 
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to other energy bands, the background in the MeV region is intense, driven by 
numerous nuclear reaction channels and the abundance of energetic neutrons and 
protons in typical observing environments. One strives for a high signal-to-noise 
ratio (SNR), where we define noise as the absolute background count rate — not 
background fluctuations, as is sometimes assumed. One of the limits of sensitivity 
in y-ray astronomy is uncontrolled, or seemingly random, variations in the back- 
ground rate. These can result from several things that we will not address, but we 
simply say that even in a well-designed instrument (e.g. a double-null system like 
Oriented Scintillator Spectrometer Experiment on the Compton Observatory!”), 
these variations are seldom much smaller than 1%. This means signals must, even 
with ideal statistics, be greater than this systematic variation to be claimed as a 
positive measurement. Consequently, this results in an effective maximum exposure 
time for steady-state sources, after which no improvement in statistical significance 
is realized. Thus, the objective of a Compton telescope design is to reduce the 
background level, so that these uncontrolled systematic variations are minimized, 
enabling sensitive detections, despite having a small collecting area. This is how 
one achieves the requisite high signal-to-noise ratio (SNR), i.e., by minimizing the 
background rate to make the small effective area most useful. A detailed discussion 
of background follows in Sec. 3. 

We can estimate the effective area of a Compton telescope for a nominal energy 
of ~1 MeV. If we require one, but only one, scatter for a y-ray entering such an 
instrument, we can choose the “optical” depth of the first active material to be of 
order 0.1 mean free path (MFP), implying an interaction probability of e~°-! or 9%. 
Multiple scatters in the first medium destroy the scattering information and should 
be avoided, thus the conservatively small depth of the first medium. The second 
scatter can take place in a thicker medium, say, of order one MFP, implying an effi- 
ciency of 63%. Finally, there is a solid angle contribution to the detection efficiency 
between the two detecting media, e.g. D1 and D2. This is proportional to the solid 
angle subtended by D2 from the perspective of D1. For the COMPTEL instrument, 
this factor was of order 5%, while for compact, pseudo-monolithic instruments, it 
can be as high as 50%. (The solid angle effect is modified at high energy because the 
Compton cross-section becomes increasingly anisotropic, peaking in the direction 
of the incident photon.) Thus, the range of detection efficiencies goes roughly from 
3 x 1073 to 3 x 107. For a 20-cm? effective area, the geometric (physical) areas 
of such instruments are then ~7000 to 700cm?. Clearly, keeping the two detecting 
media physically close to one another is advantageous, everything else being equal, 
but these instruments will still necessarily be large. 

There are various sensitivity quantities that can be confused with one another, 
and choosing the appropriate one is important. Basically, these sensitivity figures 
are connected to whether the object under study is point-like or extended and 
whether the radiation spectrum from the object is broadband or in the form of 
discrete nuclear lines. Finally, the sensitivity is different depending on whether the 
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emission is steady state or transient. If the emission is transient, Poisson statistics 
(proportional to the effective area) will determine the limiting measurable intensity. 
If it is steady state or persistent, uncontrolled background variations will drive 
the sensitivity figure. Often what is found in print is the steady-state, point-like, 
continuum or broadband sensitivity, such as that for an active galactic nucleus. In 
fact, there are a limited number of objects that emit in narrow lines. Most objects 
in this energy range are continuum sources. 

So, when speaking of the ultimate sensitivity (point source, continuum spec- 
trum) of a Compton telescope at a few MeV, we sce that it is determined by three 
things, as discussed below: the isotropic and ubiquitous cosmic diffuse y (CDG) 
flux, the radiation from material in the field of view and other proximate mate- 
rial, and random variations of that background. The CDG flux is a legitimate cos- 
mic source, but it also constitutes a background component for point or extended 
sources. Because it is isotropic and stationary, it poses a different problem than, for 
example, resolving two sources, such as a pulsar within the galactic plane. So, for 
the moment, we take the CDG and the overlying atmosphere in a balloon experi- 
ment as background for observing isolated broadband point sources. The combined 
overhead intensity at balloon altitudes from the cosmic diffuse flux! !° and the 
radiation generated in the atmospheric overburden coming from a disk 4° in diam- 
eter is of order 10~4ycm7?s~! MeV7! at a few MeV.!'®!” Thus, achieving a point- 
source continuum sensitivity better than a few times 10-°ycm~?s~!MeV~! at a 
few MeV is difficult from a balloon platform. With solid-state-like energy resolution, 
sensitivity in narrow lines would be better, everything else being equal. 

For space-based instruments, the atmospheric overburden background is 
replaced by the radiation from the passive material in the field of view, which is 
usually more massive than for balloon-borne instruments (see Sec. 3). Minimizing all 


passive material in the field of view for space missions, and flying as high as possible 
for balloons, are prudent guidelines. Thus, for the same exposure, the continuum 
sensitivity of a space-based Compton telescope in low-Earth orbit (LEO) will not 
be much better than a balloon-based instrument observing an overhead source. 


3. The MeV 7-Ray Background and its Suppression 
in Low-Earth Orbit 


The background in the MeV range is intense. For non-Compton instruments, some 
significant fraction comes from y-rays outside the field of view. Massive y-ray shields 
can suppress this, but they often consume a large fraction of the mass budget. 
These shields are typically thick scintillators (~1 MFP®*) that surround the main 
detecting elements. Photons from undesirable directions are collected in these thick 
scintillators, producing a signal in an attached photomultiplier tube or other sensor. 


*Mean Free Path. 
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This signal then can be used to lock out other interactions over a short-time window, 
whether or not any interaction is related to the incident background 9-ray. 

For non-Compton instruments, background produced within the instrument by 
neutral particles is a problem, because there is no effective way to reduce the effect 
of single-photon activation decays or neutron-induced inelastic collisions. The coin- 
cidence requirement in Compton telescopes dramatically reduces these background 
components, but only to reveal other background constituents. However, an overall 
improvement in sensitivity is achieved via the Compton technique. 

Because the intensity of the background is great, the variability of this back- 
ground is the major limitation for the detection of steady-state sources when all 
other effects are minimized. These fluctuations take the form of spurious, statisti- 
cally significant, sources and sinks in images. This makes the true significance of a 
source detection difficult to quantify. One example was the reported detection of 
broad nuclear lines of CNO from the Orion Nebula.'® This report was subsequently 
revised!? and replaced with upper limits, when it became clear that the effect was 
a non-statistical fluctuation of the internal ?4Na background (Fig. 3). However, 
scientifically speaking, the latest upper limits leave room for a positive detection 
with a more sensitive Compton telescope. We address the physics of this background 
below. 


The Compton technique provides an improved SNR compared to basic spec- 
trometers or coded-mask imagers operating in the MeV range, so suppressing the 
background is key to the success of these instruments. Thus, we devote space to 
a discussion of under-appreciated background effects for Compton telescopes, in 


Latitude (°) 


230 220 210 200 ‘ 190 
Galactic longitude (°) 


Fig. 3. The erroneous identification of emission from the Orion region. The dotted contours 
represent background fluctuations. 
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particular, the background encountered in LEO. However, we defer a discussion of 
accidental coincidences until after the various forms for a Compton telescope are 
introduced. 

We first note that the Earth’s atmosphere is of order 50 MFP thick for an MeV 
photon, so balloon platforms or spacecraft platforms are essential. We choose LEO, 
because Compton telescopes are typically large and heavy, and thus deploying them 
in deep space requires considerable resources. We return in Sec. 8 to address non- 
standard deployment scenarios. 

The orbital radiation field comprises y-rays from celestial sources, primary 
galactic cosmic rays (GCR) (all ion species and electrons) and numerous reaction 
products from GCR collisions with nitrogen, oxygen and spacecraft material. These 
reaction products include protons, electrons, muons, mesons and, importantly, neu- 
trons. Virtually all charged particles can be rejected by a charged particle shield 
(CPS), but neutrons normally escape detection and can produce a ubiquitous and 
amorphous internal background. Additionally, in LEO, for other than a purely equa- 
torial orbit, exposures to the South Atlantic Anomaly produce intense long-term 
activation in the instrument and spacecraft. Multiple photon decays, such as from 
24Na (1.37 and 2.75MeV), are particularly worrisome, because they can closely 
mimic a double-scatter y-ray.!°:!9 2! It is other similar background channels with 
high multiplicity that create problems for Compton telescopes. 

At balloon altitudes, one has the additional y radiation from the overlying 
atmosphere in the instrument field of view. Other than y-rays from external sources, 
the background problem arises from 7-rays produced within the instrument and the 
surrounding payload or bus. In the limit, with perfect internal background rejection, 
the instrument sensitivity will be limited by the neutron-induced radiation from 
passive material in the field of view. Passive material for a space mission typically 
includes mechanical structures, thermal blankets, micrometeorite shields and any 
light-tight covers for the CPS — all of which adds up quickly. On COMPTEL this 
amount was of order 1gcm~?, producing radiation of the same order as a high- 
altitude balloon background. 

The background radiation in the MeV region comprises electron brems- 
strahlung, annihilation radiation and a variety of nuclear transitions. With regard 
to the nuclear component, two variants affect Compton telescopes, electromagnetic 
nuclear transitions with lifetimes of order ps or shorter, and longer-lived induced 
radioactivity, i.e. @ decays. Transitions <100 ns within the instrument, produced by 
charged particles, can be rejected by a CPS. Those produced by neutrons are not 
rejected by the shields, nor are any y-rays emitted from neighboring instruments 
and the platform. Because the Compton scattering process relies on two sequential 
scatters, a particularly problematic background is that initiated by neutrons when 
the target nucleus, in the instrument, is excited into a continuum energy state. 
The excited nucleus relaxes by emitting several photons, some of which can be 
in the MeV range, all nearly simultaneous (dt < ps).2? For example, the energy 
level scheme for 74Mg is shown in Fig. 4. Tightly bunched levels occur at several 
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Fig. 4. Energy level diagram of a ?4Mg nucleus. 


MeV above the ground state. Neutrons that result in single-photon decays require 
two scatters and have a low rate of acceptance. Cascades are also present, often 
in neighboring material, that produce many photons, any two of which are virtu- 
ally simultaneous and can be registered independently in the instrument. When no 
secondary charged particle interacts with the CPS, they present a problem. 

One must know the net effect of these different forms of the MeV background 
to estimate sensitivity, background contamination and guide instrument design. 
To do this, the complexity of the background requires detailed and comprehensive 
modeling. This means that one must ensure that all significant radiation channels 
are included, sometimes at the level of accurate angle-dependent differential cross- 
sections. Originally written to model the response of the MEGA instrument, the 
MEGAIib software package has become a widely employed and successful tool of the 
experimental y-ray astronomy community.?? It was built upon the GEANT4 particle 
transport simulation system and includes numerous differential cross-sections for 
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electromagnetic and nuclear reactions. These cross-sections are updated periodically 
with the availability of laboratory data.> 

The ultimate success of any Compton telescope can be ascribed to its back- 
ground rejection capability and, to a lesser degree, its size. The lack of size can be 
overcome by longer exposure if the backgrounds are kept low. Several techniques 
have either been employed in flights or studied in the laboratory. 


4. Different Compton Telescope Designs or Techniques 
for Reducing Background 


There is a variety Compton telescope concepts or designs, and they all revolve 
around the tradeoffs of improving the SNR, improving angular and energy resolu- 
tion, and reducing spacecraft resource requirements, while maintaining or increas- 
ing efficiency. The different designs reflect the tension between these conflicting 
objectives. The different designs can be subdivided by their primary method for 
background rejection. The tradeoffs in background reduction methods are discussed 
in Sec. 4.4. 


4.1. Time-of-Flight 


Traditionally time-of-flight (ToF) has been used effectively on balloon and space 
platforms with scintillator-based instruments. Distances ranging from 0.3 to 1.6 m 
can be used for ToF if the timing resolution is adequate. The 1.6-m distance from 
D1 to D2 of COMPTEL represents a 5.3-ns separation (with a timing resolution of 
~1.5ns). In principle, the 10.6-ns difference between up/down vs. down/up should 
be more than sufficient, even if the detectors have 2-ns timing resolution. However, 
the D2 to D1 ToF (negative values) feature from Earth’s atmosphere is intrinsically 
much more intense than the peak from the cosmos, and the tail of the upward ToF 
peak spills over far from its central value. Furthermore, as described above, neutron- 
induced cascade y-rays from neighboring material can produce a massive feature 
many ns wide, centered on a ToF value of zero (simultaneity of signals from D1 and 
D2). This feature in the COMPTEL data was intense and generously spilled over 
into the 5.3-ns downward peak, rendering approximately half the downward-moving 
y-rays heavily contaminated with background (Fig. 5). This necessitated selecting 
only about half the peak for serious study, i.e. starting at 5.3ns to about 8 ns. 
The ToF technique can effectively only be used with scintillator-based instru- 
ments that possess adequate timing resolution. However, if progress can be made 
with solid-state detectors to significantly improve their speed and timing resolution, 


bHowever, the package’s completeness, in terms of cross-sections, should be verified by individual 
researchers, depending on the complexity of the detection or reaction channel of interest. Of 
concern for Compton telescopes, is that no explicit testing of the MEGAlib code for the all- 
important multi-photon (i.e. neutrons reactions with high atomic mass nuclei) channel is known 
to have been performed (A. Zoglauer, private communication). 
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Fig. 5. Schematic of COMPTEL ToF spectrum showing the “good” and total spectra and three 
background components. Dotted vertical lines define the COMPTEL ToF acceptance window. 


electron tracking instruments (see below) would combine the best of both worlds 
and offer both low background and elimination of the event circle ambiguity. Such 
an advance would require a sensitive, low-noise detection of the image-charge signal 
at. the moment of the first formation of a charge carrier in the solid-state device — 
not a trivial problem. The goal would be 1-ns timing resolution. 

With current technology, the ASCOT instrument is a design built around fast 
scintillators (Fig. 5).2°?4 It employs a ToF system superior to that of COMP- 
TEL. It uses small, fast scintillators to achieve a ToF resolution of order 500 ps, as 
opposed to the 1.5ns of COMPTEL (Fig. 6). The difference allows the separation 
between good downward-moving y-rays and multi-photon background events gen- 
erated locally. The improved resolution also allows the separated detector planes 
to be closer together, thereby increasing the solid angle factor and the efficiency 
of the instrument. An earlier version, called FNIT, used a deuterium-based liquid 
scintillator to eliminate the internal 2.2-MeV background from thermalized neutrons 
in the organic scintillators.1° 


4.2. Tracking Recoil Electrons 


Because the cosmic y intensity is so low compared to that of the local background 
and terrestrial sources, it is critical to identify and reject y-rays coming from the 
wrong direction, e.g. from the Earth’s atmosphere when pointing to the zenith. 
In solid-state detectors, where timing resolution is poorer than in scintillators, the 
poorer intrinsic ToF resolution precludes the ToF option. In lieu of a ToF measure- 
ment, if one could measure the momentum vector of the recoil Compton electrons, 


62 J. M. Ryan 


Anti Coincidence 
Panels 


D1 Layers 


D2 Layer 


Fig. 6. The ASCOT instrument showing the detecting planes D1 and D2 and the CPS system. 


then that could be an effective surrogate for ToF, at least for front vs. back dis- 
crimination. By measuring the recoil-electron momentum vector, the event cone in 
Fig. 2 collapses to a single direction, or an arc, representing a much smaller solid 
angle in which background could be present. 

This technique requires tracking (multiple measurements) of the recoil electron. 
This can be accomplished in gas tracking detectors?° and thin (<300 ym) silicon 
strip detectors (SSDs).2° 28 However, in solid-state devices, electron scattering is 
problematic, making measurements at 1 MeV difficult, but it is possible above a 
few MeV, where the recoil electrons have sufficient momentum that scattering is 
minimized. For example, the projected range of a 5-MeV electron in silicon is com- 
parable to the thickness of the 300-jzm SSD detectors in CompPair.?° The projected 
range is disproportionally short due to the electron scattering. However, even simply 
measuring whether the recoil electron is moving away from the instrument aperture 
provides a significant reduction in background by eliminating y-rays coming from 
the “wrong” direction, even if the event circle is not reduced to an arc segment. 
Figure 7 shows a schematic of the MEGA instrument.?° 

A parameter can be constructed to quantify the probability that the scattered 
electron is traveling away from the aperture. This is called the Direction of Motion 
(DOM) parameter, and is employed in the TIGRE experiment. For electrons that 
traverse two or more 300-ym silicon strips, Ref. 29 reported values of 57% at 1 MeV, 
improving to 94% at 6MeV, where 100% implies no forward Compton electron 
backscatter from y-rays outside (opposite) the instrument aperture. 1-MeV events 
that do not trigger two strips yield no value for the DOM and are of limited value. 

Thus, at low energies (e.g. 500 keV) in solid-state detectors, electron tracking 
is not currently feasible, but as the energy increases and electron tracking becomes 
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Fig. 7. Schematic of the MEGA instrument. Taken from Ref. 43. 


easier, the performance of the instrument improves quickly. At higher energies still, 
e.g. 26 MeV, pair production competes with Compton scattering and the instrument 
performance improves further, with arguably better performance at tens of MeV 
than dedicated pair-production telescopes, such as the Large Area Telescope (LAT) 
on the Fermi Gamma Ray Space Telescope (Fermi). The advantage that instruments 
like MEGA, TIGRE and ComPAIR. have over instruments like Fermi-LAT?° and 
EGRET*! is that the higher-energy versions rely on pair-conversion converters of 
high-Z material, e.g. Ta. These converters scatter or stop the electrons below several 
tens of MeV. 

The electron scattering problem is much reduced in gaseous detectors, but 
the detector stopping-power limits the instrument effective area. However, an even 
greater problem is that Compton scatter events in these instruments are very slow 
in developing (ms), because of the long paths the ionization charge must drift. 
Thus, Compton scatter events are cluttered with many other unrelated ionizations, 
making the coincidence of the signals in the gas and calorimeters confusing and 
impractical. These instruments are best suited for electron-pair detection with a 
clear vertex signature. 

Liquid Xe (LXe) has been used frequently in dark matter searches.3? The advan- 
tage of LXe is that it scintillates and the drifting charge locates the photon inter- 
action sites accurately. The scintillation provides a ToF signal (at some expense to 
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the drifting charge signal).°° It combines advantages of a ToF experiment and a 
drift chamber. However, the high atomic number aggravates the Doppler broaden- 
ing effect and secondary electron bremsstrahlung must be considered as an energy 
loss channel. The Xe must also be ultra pure, free from contaminants of other noble 
liquids and radioactive isotopes of Xe (holdover from atmospheric weapons testing). 


4.3. Event Conformity 


Finally, there is a generation of instruments employing solid-state technology, where 
recoil particle tracking is not practical, nor is a ToF measurement, e.g. COSI*4 
(Fig. 8). These instruments are segmented liberally with individualized readouts, 
where the detecting elements are not thin, as they are for the trackers. A Comp- 
ton scatter deposits energy in two or more active elements. The sequence of the 
interactions is unknown and the thick detector segments do not reveal any parti- 
cle track; however, the energy resolution can be extremely fine. The precision and 
accuracy of the energy measurements shrinks the width of the event circle annulus 
to the sub-degree range, which reduces background. These instruments are typically 
“monolithic”, in that there is little or no significant gap between any of the detector 
elements, either horizontally or vertically, eliminating the distinction between D1 
and D2 and making the instrument efficient and compact. However, the distance 
between the scatters is thus small (~cm), making the direction of the axis of the 
event cone highly uncertain, effectively negating the advantage achieved with the 
excellent energy resolution. Because less information is available (particle tracking 
or ToF), one must identify “bad” events by recognizing that they do not on-average 
conform in multi-dimensional data space to “good” single-photon Compton scatters 


Fig. 8. Schematic of the COSI instrument and a three-scatter event. 
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from the instrument aperture. This implicitly assumes that background simulations 
are quite accurate and complete to make for clear identifications. Given that the 
background is intense, background events bleeding into the “good” event data set 
is a problem. 

Cascade reactions are dispersed widely in the data space of these instruments. 
The data space for such instruments includes energy and location for all detecting 
elements in a resolving time determined by the intrinsic timing resolution of the 
solid-state detectors and their associated electronics. Thus, the background rate for 
these instruments greatly depends on how the point spread function of local y-rays 
or y-rays from outside the field of view spill over into the data space occupied by 
cosmic ‘y-rays. 

Because these instruments are constructed from solid-state devices with energy 
resolutions better than scintillators, they lend themselves to imaging and spec- 
troscopy for fine structure in the photon spectrum, e.g. the 511-keV radiation from 
the Galactic Center or the mapping of the 2°Al component at 1.809 MeV. Detection 
of lines from supernovae is a particularly good application, where several features 
from 847 keV to several MeV are present, all broadened significantly shortly after 
detonation, when the spectrum is rich in lines. The restriction in energy space 
reduces the background, as compared to broadband spectroscopy and imaging, espe- 
cially for line features not present in the background data, e.g. 847keV from °°Co. 
COSI has also successfully imaged strong sources and transients. Figure 9 shows 
the COSI image of a cosmic y-ray burst from its 2016 balloon flight.°° 


4.4. Backgrounds Rejected 


As discussed above, the major backgrounds to reject are those produced by charged 
particles, y-rays from directions other than the aperture, activation (long-lived 3 
decays), prompt multi-photon de-excitations from neutron scatters and, not dis- 
cussed so far, accidental or chance coincidences. 

The charged-particle background is, as mentioned above, effectively handled 
with a CPS. Charged particle shields are constructed from panels or domes of thin 
(<1 cm) plastic detectors. These panels fully enclose the instrument or, better yet, 


Fig. 9. GRB 160530A detected with COSI. 
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individual detecting systems. No charged particle can enter or escape without detec- 
tion. Because the panels are thin, they only attenuate the incident y-rays slightly. 

Gamma-rays from “wrong” directions require ToF or electron tracking to be 
rejected. This is a more difficult task for event-conformity techniques. For example, 
a large scatter in a rear element followed by a small scatter in a forward element has 
the same data signature as an incoming ¥ that scatters first in the forward elements, 
then depositing a larger signal in the rearward elements. This is problematic when 
large backgrounds are present behind or off to the side of the instrument, such as 
in low Earth orbit, which has a bright y background from the Earth’s atmosphere 
and horizon.'® For such instruments, y-ray shields are often employed, e.g. thick CsI 
detectors that envelope the instrument on all sides except for the intended aperture. 
This has the advantage of reducing the intensity from undesirable directions and 
also serving as the CPS. The major penalty is the increased mass of the instrument, 
where the y shield mass can exceed the remainder of the instrument. Furthermore, 
such shields may not be 100% effective. They are, depending on the thickness, 
transparent and can even be a source of neutron-induced background emanating 
from the inner portions of the shield (typically less than one MFP inside the shield). 

Event conformity instruments often rely on more than two scatters to help 
identify good Compton scatters. As described below, the combinatorial probability 
of labeling a bad gamma-ray as “good” decreases with three-fold coincidences, but 
the susceptibility to accidental coincidences can rise. 

Multi-photon decays, whether from activation or induced by prompt neutrons, 
are difficult to reject. Such events have signatures in ToF and recoil electron 
momenta, but have amorphous and widely distributed data signatures otherwise. 

Lastly, accidental coincidences plague all Compton telescopes. They occur when 
two (or more) physically unrelated photons interact within the instrument, produc- 
ing a signal that is also widely distributed in data space (uniformly in ToF space). 
The only way to suppress this is to have the shortest possible coincidence window, 
which is fixed by the resolving time of the individual detecting elements with appro- 
priately fast electronics. For scintillators, this can be as short as tens of ns, whereas 
solid-state detectors have minimum coincidence windows of order hundreds of ns, 
simply because of the mobility of the charge carriers. As instruments grow, the effect 
grows. The general expression is 


fa = 2fi foAt, (2) 


where f, is the frequency of accidental counts, f; and f2 are the rates in the two 
detecting media, which could be the same, and At is the coincidence window. The 
factor of two arises from the two possible sequences of interactions. If the individual 
rates represent the same detector f, such as for an event-conformity instrument, 
the three-fold accidental rate is 


fa = 6f?(At)?. (3) 
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Thus, the ratio of accidentals to good events f,/f « (fAt)", where n is either 
1 or 2, depending on whether we have a two- or three-fold coincidence. Thus, the 
value of the count rate with respect to the resolving time is a critical parameter, 
i.e. fAt should be less than unity, ideally, much less. 

In an effort to increase the instrument size to detect more photons, extra steps, 
such as segmenting the instrument into multiple parallel instruments, must be taken 
to not let the accidental rate dominate. For example, a two-scatter instrument with 
a neutral-particle count rate of 1 MHz in each detector and a coincidence window of 
1 ps will have a two-fold accidental rate of order 1 MHz, whereas as an instrument 
with a 1-kHz rate with the same coincidence window will have an accidental rate of 
order 1 Hz. 

Without a doubt, the most difficult measurement to make, and perhaps the ulti- 
mate test of a broadband Compton telescope, is that of the cosmic diffuse emission. 
Almost by definition, it has few identifying features. It is isotropic and station- 
ary with a featureless spectrum. Consequently, it is a measurement for which all 
sources of amorphous background must be removed. For many years, published 
spectra of the cosmic diffuse emission exhibited a “bump” from 1 MeV to several 
MeV, sitting atop a power law spectrum.*© A measurement of the CDG using the 
COMPTEL data was possible because the local background was quantified by the 
charged particle rate. Knowing this, and with computations of how the double- 
photon ?4Na background populated data space, severe data selections combined 
with extrapolation of the excess count rate to zero background yielded the cosmic 
diffuse spectrum, with no bump present.!?-'° This indicated, as stated by Ref. 37, 
that the bump comprised background events. The COMPTEL data restrictions 
were so severe that in the final analysis, only of order one y-ray per orbit was 
admitted to the cosmic diffuse database from which the researchers produced a 
spectrum. To achieve an improved spectrum, more photons are necessary, requiring 
longer exposures or wider data selection windows allowed only through improved 
background suppression. 


5. Imaging 


One of the great advantages of Compton telescopes is their ability to produce 
images, but the inherent directionality of the y-ray detection process is unusual. 
The focusing of y-rays at 511 keV and above is unusually difficult, if not currently 
impractical, getting worse with increasing photon energy. However, even though we 
are unable to direct the photons to a desired detection plane, either by refraction or 
reflection, we can retrace the path of the registered photons after the fact, provided 
that the photon detection satisfies certain selection criteria. Even then, unless we 
can measure the recoil-electron momentum vector, the original photon direction 
is restricted to no better than to the mantle of a cone (Fig. 2). The axis of the 
cone is the direction of the recoil y-ray, whereas the half angle of the cone is the 
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Compton scattering angle, as defined in Eq. (1). COMPTEL was unable to measure 
the direction of the recoil Compton electron, thus the azimuthal degeneracy. How- 
ever, the Compton scattering angle is measured accurately if the full energy of the 
recoil electron and the full energy of the recoil ray are both measured. The precision 
of those measurements, defined by the instrument energy resolution is, in part, a 
major contributor to a thickness we ascribe to the cone mantle. Spatial resolution 
(i.e. where the interactions take place in the detector) quantifies the wobble in 
the cone axis, again broadening the cone mantle. To achieve the ultimate point 
spread function, both the spatial and energy resolution figures must be small. For 
COMPTEL, the energy resolution dominated the angular resolution up to about 
2 MeV. Above that figure, the spatial resolution dominated. For higher Z material 
for the first scatter, e.g. Ge, Doppler broadening adds an additional and independent 
angular-resolution factor of order 1° at 500keV (approximately half that amount 
at_ 1 MeV).8 

If one imagines an image plane at infinity that coincides with the object plane, 
the response of the instrument to a single photon detection is an annulus, whose 
width corresponds to the thickness of the cone mantle centered about the scattered 
photon vector projected backward to infinity. The width of the annulus is set by the 
energy resolution and the spatial resolution figures. Contrast this with a focusing 
instrument where the instrument response to a single photon is a point within a 
disk of confusion, where the disk is centered on the true photon direction, the radius 
of the disk being the angular resolution. Thus, the point spread function (PSF) is 
a large thin annulus on the sky — large in diameter, but small in width. 

So, despite the odd shape of the PSF, imaging is still possible, through a variety 
of means, many of which use forward folding; that is, using a model of the emission 
pattern, folding it through (or convolving it with) a generic instrument response 
and comparing those hypothetical data to the real data. Different components to 
the model can be added to better describe the acquired data, for example, a model 
for the Galactic Plane, the Cosmic Diffuse Flux and unknown point sources. 

The pedestrian or layman’s imaging method is to superpose all the circles (or 
annuli) onto the image plane, either summing them or constructing a product. This 
is effective when the SNR is high, such as a radioactive source in the laboratory, 
or a solar flare or cosmic y burst (Fig. 10). Otherwise, because of the solid angle 
of the annuli, considerable background can obliterate any source imaged this way. 
This is, in part, due to the fact that summing annuli discards information that can 
be used to improve the SNR. Each annulus is described by three quantities, the 
two directions that specify the scattered photon direction (axis of the cone) and the 
cone half angle. Each detected y-ray is then a point in 3-space and the summing 
process projects that onto two dimensions. 

Exploiting the full information content of the PSF is more complicated, but pro- 
duces consistently better results. The process employed for analyzing the COMP- 
TEL data was described in detail by Ref. 38. A major simplification of the PSF 
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Fig. 10. COMPTEL image of the 1991 May 5 cosmic-ray burst, assembled from individual event 
circles. 


was achieved by treating a single sub-element of the telescope in detail and then 
adjusting that PSF for different sub-elements to assemble a full instrument. For 
COMPTEL each sub-element comprised a single D1 detector module and a single 
D2 detector module. With 7 D1 modules and 14 D2 modules, 96 “mini-telescopes” 
made up the COMPTEL instrument. Thus, the instrument response for different 
problems could be assembled from the PSF of a single “mini-telescope”. 

With an instrument PSF, constructed either by brute force Monte Carlo calcu- 
lation on the full instrument, or a hybrid PSF, such as that by Ref. 38, can be used 
in a variety of forward folding procedures. For example, to analyze the data from 
the Galactic anti-center, the model would include sources for the Crab pulsar and 
nebula and for Geminga, as well as models for the Galactic Plane and the Cosmic 
Diffuse Emission. The “data” produced by running the model through the PSF can 
be compared to real data (with background subtracted or with background as part 
of the model) with iterative statistics computed along the way if the method is a 
Maximum Likelihood, Maximum Entropy or some other algorithm. Such was the 
process in producing a broadband image of the full sky in different bands (Fig. 11). 
39 is also possible 
and in certain circumstances may be more sensitive because it allows sources to 


Direct deconvolution, using something like a Pixon scheme, 


emerge from the data without suppressing them by way of statistical constraints. 
However, computing the statistical significance of a detection can be more difficult. 
This method, for example, may be used to identify candidate sources that can be 
confirmed using more statistically demanding methods (e.g. Bayesian statistics). 
We return now to the case of the technology that tracks the recoil electron, i.e. 
the electron tracking type of instrument. The basic PSF of an event cone remains, 
but the extent of the cone has been reduced in its azimuthal dimension from 27 
(full cone) to a conical segment, the extent of which depends on the ability of the 
instrument to precisely measure the recoil electron direction. For higher energy 
electrons, where the tracking is precise, the cone collapses to a thin slice, projecting 
onto the image plane as a point-like source. Consequently, the accepted background 
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Fig. 11. COMPTEL full-sky image from 3 to 10 MeV. 


is proportionately reduced as is the uncertainty in the incident photon direction. 
Both are reduced by the ratio of the annulus solid angle to that of the conical slice. 
See Fig. 7, At lower energies, where the recoil electron is fully contained in one 
solid-state detecting element, the electron-tracking property of the instrument van- 
ishes, and it becomes an “event conformity” instrument, unfortunately at important 
energies such as the °°Co line at 847 keV. 


6. The Multi-dimensionality Nature of Compton Telescope Data 


Regardless of the type of Compton telescope, the data are recorded on an event- 
by-event basis, requiring a list-mode method of analysis. For the simplest version of 
the instrument, the classic design like COMPTEL, one records an energy deposit in 
the first detector (D1) and the same for the second detector (D2), the ToF between 
scatters and the two angles defining the displacement vector between the scatter 
sites in D1 and D2. Selections or cuts can be applied on any of the recorded data. 
The cuts change the performance of the instrument, meaning a different instrument 
threshold and energy range, a different field of view and a consequent different 
effective area. Such cuts are imposed to improve the SNR, while minimizing the 
degradation in the science performance. 

Hardware restrictions serve the same function. For example, if the hardware 
threshold in D1 is set high to eliminate electronic noise in the circuitry, the min- 
imum scatter for double scatter events increases (Eq. (1)), meaning larger event 
circles or cones. Because larger event circles represent greater solid angle through 
which background can intrude, keeping the threshold as low as possible for D1 is 
imperative. For COMPTEL, with an energy range starting at 800 keV, the Dl 
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threshold was held to below 50 keV, allowing for small event circles even at 1 MeV. 
This is particularly important for all experiments that are exposed to a bright 
horizon that can fall within a large event circle. 

Geometry cuts can also be made to restrict the scattered photon velocity vector 
to be close to the instrument axis, again to reduce the probability of background at 
the edges of the field of view from intruding. In the COMPTEL analysis, combined 
cuts on the scatter angle and the scattered photon velocity vector direction were 
successfully used to dynamically suppress photons from the Earth’s horizon. 

As discussed above, cuts on ToF can be used to suppress internal background. 

For more complex instruments, other cuts can be made. These could take the 
form of pulse-shape cuts for certain scintillators to suppress neutron effects or cuts 
on the recoil-electron energy or direction in electron tracking instruments. The 
method of selecting data for event conformity instruments can be complicated, 
because of the uncertainty of knowing the sequence of scatters and of the fact that 
three or more scatters are possible. 

Speaking in general terms, each recorded event can be expressed as a point in a 
multi-dimensional data space. In this data space, the background events will prefer- 
entially occupy some region of data space and may thus be avoided. In practice, one 
studies the variation of the SNR over this data space and chooses volumes where 
the SNR is the highest with the least sacrifice in effective area. Having chosen the 
most productive region in data space, one then recomputes the instrument response 
function for these data selections. One should further note that these selections may 
vary according to the type of observation being made, e.g. steady state or transient, 
meaning that a single set of data selections is not realistic for a full range of science 
objectives or observations. 


7. Spectroscopy 


Inexorably linked to imaging is spectroscopy. The linkage occurs because of the 
reliance of the Compton scatter angle on the energy of the recoil electron and photon 
(see Eq. (1)). Furthermore, the PSF, because of the Klein—Nishina cross-section, has 
a reduced probability for large angle scatters with respect to that for small angle 
scatters. An easy way to visualize this is to recognize that, for example, the field 
of view decreases with increasing photon energy, because of the low probability of 
large scatters as energy increases. Knowing this, the imaging process can be tuned 
to image at a specific line energy (e.g. the 1.8-MeV line from 7°Al), or the spectrum 
can be extracted from the imaging data. In either case, the PSF depends on the 
photon energy. 

By selecting a window around 1.8 MeV and recomputing the instrument PSF, 
a full map of the sky can be obtained to search for the stellar remnants responsible 
for the 7°Al production over the last million years as shown in Fig. 12 (as adapted 
from Fig. 1 by Ref. 40). 
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Fig. 12. COMPTEL full-sky map in the light of 1.809 MeV (?°Al). Adapted from Ref. 40. 
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Fig. 13. The COMPTEL count spectrum for part of the solar flare of 1991 June 11. Used with 
permission from Ref. 41. 


For a special case such as that of a solar flare with a complex and structured 
spectrum, one can accept events for which the event cone captures the position of 
the Sun, and with an appropriately adjusted PSF, the resulting count spectrum 
can be converted into a photon spectrum. The luxury of only accepting y events 
where the Sun lies on an event cone is illustrated in Fig. 13 for a solar flare.4! The 
statistics are low, but the resulting count spectrum is clean. Here, one can see that 
the instrument response in energy space is diagonal, that is, the measured photon 
energy is the true energy, except for the intrinsic detector broadening. This makes 
the deconvolution of the energy count spectrum simple and unambiguous. 

For our example of solar flares with poorer statistics, the event cone restriction 
can be relaxed to yield more counts, but with consequently poorer energy resolution. 
The data cut takes the form of rejecting event cones if they do not pass within a 
small, programmable, angle of the true source, in this case, the Sun. In practice, 
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one relaxes the event cone restriction incrementally, attempting to strike a balance 
between the ease of deconvolving the count spectrum and the improved statistics. 
Investigating the energy spectrum of extended objects, such as the Galactic 
Plane should take place through a forward-folding exercise. For such large extents, 
almost all event cones will pass through the plane, providing no method for selecting 
events that are highly probable of coming from the plane and not elsewhere. 


8. Polarimetry 


Compton telescopes are capable of measuring the net linear polarization of y-rays 
from a source. The Klein—Nishina formula articulates the azimuthal dependence of 
scattered photon and electron with respect to the plane of linear polarization of 
the incident y-ray. The dependence is not deterministic, so the azimuthal scatter- 
ing direction distribution of either (or both) the Compton scattered y-ray or the 
Compton electron must be assembled on a statistical basis. If the scattered photon 
direction is described by the polar angle 6 (the scatter angle) and an azimuth angle 
vy, where y = 0 refers to the azimuth direction of the incident electric field, then for 
a given scatter angle and incident energy, the distribution of azimuth directions is 
proportional to (k; — kg cos? y), as shown in Fig. 14, Here ky and kz are constants 
that depend on the incident energy and scatter angle 0, i.e., maximizing orthogonal 
to the incident electric field orientation,*? similar to that of Thomson scattering. 
Stated another way, for a given 6, the cross-section minimizes when the scatter 
vector lies in the same plane as the incident electric field vector. The modulation of 
this distribution is greatest for large scatters and decreases with increasing energy. 
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Fig. 14. The azimuth intensity distribution of scattered photons. Adapted from Fig. 2 in Ref. 42. 
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For 7/2 scatters, the modulation between maximum and minimum is ~100% for 
Thomson scattering, but asymptotically approaches zero (~1/E,) for large inci- 
dent energies. At high energies, an increasing fraction of scattered y-rays exhibits 
no preferential scattering direction, decreasing the ratio of the trigonometric term 
to the constant term. Furthermore, the physics implies that the greatest azimuth 
modulation is obtained from instruments with the largest fields of view, i.e. many 
large angle scatters. 


9. Deployments 


As mentioned above, Compton telescopes for astronomy must be deployed above 
or at the top of Earth’s atmosphere. Balloon platforms are relatively inexpensive, 
but the integrated exposure times are short, on the timescale of one day even for 
long-duration flights. Their sensitivity (per unit time) on balloons is competitive 
with LEO space-based platforms. Balloon flights represent the most expeditious 
path to measurements from the time of the instrument concept, often only three to 
four years for a major payload. Because of their relatively short durations, there is 
less activation of instrument and spacecraft material, e.g. ??Na with a half life of 
5 years. 

Space-based platform missions can obtain far longer exposures where statistics 
are necessary for steady-state source, but also presenting the best opportunities for 
detecting transient events, such as cosmic y-ray bursts. 

There are several disadvantages to a space-based mission. The opportunities 
for these missions are infrequent, and the time between instrument concept and 
observations can be many years. Also, during the course of a several-year mis- 
sion, internal radioactivity will build up. The final intensity of the buildup will 
depend on the particular isotope, its production rate coming into equilibrium with 
its decay rate. Furthermore, because of the presence of an accompanying space- 
craft, there will be an additional neutron component in the background, above 
and beyond that from the Earth’s atmosphere. The quietest LEO is a low-altitude, 
purely equatorial (+2 — 3° inclination) orbit. There the GCR rate has the lowest 
average intensity and the orbit stays outside the South Atlantic Anomaly, thus 
greatly reducing the radioactive buildup. If the orbit does cross the SAA, then one 
must maintain the lowest possible altitude, requiring frequent re-boosts for orbital 


maintenance. 

Lastly, a deep space deployment is possible. In deep space, one is far from the 
noisy radiation field of the Earth and, if the instrument can be separated from 
the mother ship, the neutrons from the supporting spacecraft and the y-rays it 
produces can be minimized. Ideally, the background is then entirely produced by 
charged GCRs that are rejected by the CPS. 

With good directional selection, one might think that a deployment on the 
lunar surface (or Mars) could be such a place. However, the lunar regolith is a 
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prolific emitter of neutrons (more than the Earth’s atmosphere) that will generate 
background within the instrument. 


10. Summary and Conclusions 


The COMPTEL experiment was whimsically called by its investigators the 
“Complicated Telescope”, but none of the designs or implementations discussed is 
much simpler, if at all. Any effort to improve SNR in the MeV range will be complex 
and difficult, regardless of the Compton-telescope design. However, without these 
heroic efforts, the sensitivity of measurements from 400keV to 20MeV will not 
improve over what was experienced in COMPTEL or the two INTEGRAL instru- 
ments.? Even so, continued study into suppressing background will be rewarded 
with improvements in sensitivity, because all the measurements to date are lim- 
ited by systematics and not the statistics of the observation. Part of the learning 
process includes on-orbit experience or long-duration balloon flights to measure 
and characterize the backgrounds that can guide the hardware design or algorithm 
developments necessary to suppress those background components. 
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An introduction to pair-conversion gamma-ray imagers is given. A description 
of the physics of pair conversion is presented, followed by a brief history of 
space-based gamma-ray telescopes based on this effect. Design considerations 
are discussed including trade-offs between effective area, angular resolution and 
cosmic-ray background rejection. 


1. Introduction 


Astronomical observations at the high-energy end of the electromagnetic spectrum 
have seen dramatic advances in the past few decades. Space-based instruments 
provide near all-sky monitoring, while ground-based instruments have extended the 
energy reach to several TeV. In this chapter space-based instruments, which utilize 
gamma-ray pair conversion to detect photons, will be introduced. The goal here is 
to give an overall framework in which to understand the details of an instrument’s 
design. Specific instruments and in-depth expositions of analysis techniques are left 
to the references. 


1.1. Pair Conversion Physics 


The interaction between light and matter changes dramatically as the energy of the 
individual photons increases. In the lowest portions of the electromagnetic spectrum 
simple wire antennas can pick up a signal. At optical energies, mirrors and lenses 
concentrate the image onto photosensitive devices. Higher still are X-rays, which are 
increasingly hard to focus and detect as the energy goes up. Beyond X-rays is the 
gamma-ray portion of the spectrum dominated by nuclear and quantum processes. 
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Fig. 1. Left: a depiction of a gamma ray passing close to a Tungsten nucleus and converting to an 
et e~ pair in the strong electric field. Right: one of the lowest order Feynman diagrams describing 
the process. 


Asymptotically as energy increases the dominant interaction of gamma rays with 
matter is through the pair conversion process depicted in Fig. 1 

Pair conversion has a threshold in energy due to the fact that an et e7 pair 
has a finite mass and the energy to “make them” has to be available. It is a simple 
exercise to calculate the threshold energy and the result is 


Me 
Ern — 2Me (1 + ic). 


where me and M,, are the masses of the electron and nucleus, respectively. The 
term involving M,, in most instances is negligible, thus Ep, = 2m, = 1.02 MeV , as 
one would have intuitively guessed. 

Within a decade in energy above pair threshold, the conversion process dom- 
inates photon—matter interactions, becoming ~100% of the total cross-section by 
1 GeV as shown in Fig. 2 (from Ref. 1). 

The amount of material a photon can traverse before undergoing pair conversion 
is given by the radiation length, Xo, of the material. Formally Xo is the distance 
in a material that a high-energy electron will lose 1/e of its energy (due mainly to 
bremsstrahlung), but due to the close quantum electrodynamics (QED) relationship 
of bremsstrahlung to pair conversion, Xo similarly applies to high energy photons 
with an additional factor of 7/9. A formula for Xo is given in Ref. 2 and basically 
goes as A/Z?. 

The usual practice is to select a dense high-Z material to serve as the “con- 
verters”. Radiation lengths for a few materials often appearing in pair imagers are 
given below from Ref. 1. The first four rows in Table 1 are examples of materials 
used in particle detectors while the last five rows are materials that could serve as 
converter foils. 

In pair-conversion imagers, the incident photon’s direction is inferred by mea- 
suring the trajectories of the resulting e* and e~. Neither of these is perfectly co- 
aligned with the parent photon’s direction due to nuclear recoil and the kinematics 
determined from the materialization of the pair. The latter determines the so-called 
QED opening angle between the electron and positron trajectories. A distribution 
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Fig. 2. The various cross-sections for photon energies from 10 eV to 100 GeV and the total (from 
Ref. 1). 


Table 1. Properties of materials often found in Imaging Pair Telescopes. 


Material Z A Density (g/cm?) Xo (g/cm?) Xo (cm) 
Al 13 27.0 2.70 24.01 8.89 
Si 14 28.1 2.33 21.82 9.36 
Ar(gas) 18 39.9 1.40 (g/l) @ STP 19.55 13960 

Ar(liq) 18 39.9 1.66 19.55 11.78 
Ta 73 180.1 16.7 6.82 0.409 
W 74 183.8 19.3 6.76 0.350 
Au 79 197.0 19.3 6.46 0.335 
Pb 82 207.2 11.4 6.37 0.559 


U 92 238.0 19.0 6.00 0.316 


of the QED opening angle (§qEp) for 100 MeV incident gamma rays is shown 
in Fig. 3. 

A convenient approximation for the most probable opening angle of this distri- 
bution is 


4me 
Sgnp % —* & 1° @100 MeV. 
E, 


OaEp is a convenient benchmark, which will later be used to evaluate the significance 
of various contributions to the point spread function (PSF). 
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Fig. 3. The QED opening angle distribution for 100 MeV gamma rays. 


An often-overlooked aspect of pair conversion is its sensitivity to the linear 
polarization of the incident gamma ray. The plane of the et e~ pair tends to be in 
the plane of the polarization of the photon. The QED asymmetry between parallel 
and perpendicular to the polarization plane is ~24%. In order to be sensitive to this 
effect the single track angular resolution must be commensurate with Oagp. Ref. 3, 
as well as the others mentioned in Sec. 4, contains more details. 

The last item relating to the physics of pair conversion is nuclear recoil. Some of 
the most widely used Monte Carlo programs do not include this effect since in most 
instances it is considered small enough to be ignored. Since the incoming photon is 
massless, energy and momentum must be exchanged with the nucleus. The momen- 
tum exchanged will be ~1 MeV/c (essentially ~2m./c). In Ref. 3 Hunter provides 
an approximate, heuristic formula describing the smearing effect this exchange has 
on the reconstructed photon’s direction: 


10 =): 


Onc (E) = 5° ( E 


where Oyc is the 68% containment angle for this distortion on the incoming gamma- 
ray direction and EF is its energy in MeV. Unless the recoil is measured, this becomes 
the ultimate limit on the direction resolution, and with present technologies it is 
difficult-to-impossible to achieve. 


1.2. Brief History of Gamma-Ray Observatories 


The Earth’s atmosphere makes the observation of high-energy gamma rays on the 
surface hard-to-impossible below some tens of GeV. Access to outer space occurred 


Gamma-Ray Pair-Conversion Imaging Telescopes 81 


in the 1950s with the launch of Sputnik and thus it became possible to extend 
astronomical observations to the high-energy end of the electromagnetic spectrum. 
There was skepticism that not much, if anything, would be seen in the gamma-ray 
band since thermal processes were thought to be the source of most, if not all the 
light in the universe. And beyond the energy of nuclear lines, what could possibly 
produce 100 MeV, GeV, or TeV photons? 

In spite of the doubts, a modest effort to detect photons at high energy came 
in 1967 with the launch of OSO-3.4 It did not have a pair-conversion “imager” per 
se and was essentially a single channel calorimeter surrounded by charged particle 
detectors to reject cosmic rays. In its ~1 year mission it recorded 621 gamma rays 
and revealed the Milky Way as a gamma-ray source. 

Next came SAS-2 in 1972.° This instrument had a gamma-ray imager and 
became the prototype for future instruments. The imager was a planar device based 
on triggered spark chamber technology adopted from experimental particle physics. 
SAS-2 recorded ~8000 gamma rays. It firmly established the Milky Way as a source 
of gamma rays and was the first to “see” gamma rays from sources on the Galactic 
plane such as the Crab and Vela pulsars. 

SAS-2 was followed by COS-B in 1975.° The overall configuration of the COS- 
B instrument was similar to SAS-2. The previous mission had a relatively short 
duration due to malfunctions. COS-B, however, lasted for ~7 years, resulting in 
by far the largest sample of gamma-ray data yet recorded: ~200,000 photons. It 
clearly showed the Galactic Ridge and several bright pulsars along it as well as 
some sources well off the Galactic plane, such as the active galaxy 3C273. 

In 1991, after a long delay caused by the Challenger Disaster, the Comp- 
ton Gamma-ray Observatory’ was launched by the Space Shuttle as part of the 
Great Observatories program. The Energetic Gamma-Ray Experiment Telescope, 
or EGRET, was one of four instruments onboard. EGRET® revealed a gamma-ray 
sky rich in sources, both within our galaxy and at cosmological distances. It found 
that the gamma-ray sky was filled with variable sources: flaring AGNs, GRBs, and 
pulsars to name a few. Over its ~9-year lifetime EGRET recorded ~1.6 million 
gamma-ray events. 

The technologies used in EGRET were contemporary in particle physics in the 
1960s. In the 1970s and 1980s particle detector technology evolved tremendously and 
given the successes of EGRET, the obvious question was how much could EGRET 
be improved on. Fermi-LAT, originally named GLAST, was initiated in the spring 
of 1992 at Stanford.? The basic layout of the GLAST followed the lead of its pre- 
decessors in using an imaging pair-conversion telescope followed by a calorimeter, 
encased in a shell of scintillators to veto incoming cosmic rays. The Fermi mis- 
sion, launched in June 2008, remains in orbit today and has so far recorded over 
200 million gamma-ray events. Sketches of EGRET and Fermi-LAT! are shown 
in Fig. 4. 
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Fig. 4. Left: a sketch of the EGRET instrument. Right: GLAST, now renamed Fermi-LAT. 


2. Design Considerations for Gamma-Ray Observatories 


This section is divided into three parts: Pair Imager, Calorimeter, and Background 
Rejection. The Pair Imager converts the gamma rays into electron-positron pairs 
and produces the best estimate of the photon directions. The Calorimeter measures 
the photon energy. And over-arching all is how to reject the flood of background 
particles coming both from the Earth as well as celestial cosmic rays. 


2.1. Monte Carlo Simulations 


Modern particle detectors usually have complex geometries involving a variety of 
materials, geometric shapes, gaps and cracks, etc. Making informed design deci- 
sions, developing the analysis and reconstruction programs, and estimating the per- 
formance characteristics (e.g. effective area, PSF, and so on) requires a detailed 
simulation of the device. 

The most commonly used program is the widely available GEANT 41! software 
package from CERN. It is capable of handling complex 3D geometries and has a 
rather complete suite of particle physics interactions including both electromagnetic 
and strong and weak interactions. There are other programs (e.g. EGS 517), which 
specialize in particular areas, but given that any design must contend with both 
the detection of gamma rays as well as the rejection of cosmic rays, their utility is 
not usually practical. 


2.2. Pair Imager 


The essential layout of a gamma-ray imager is shown schematically in Fig. 5. Most 
instruments to date have been stacks of planar devices with sheets of high Z material 
to convert the incident photons, interspersed with particle detectors, which record 
the locations of the et and e~ that result from the conversion. The technology 
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Fig. 5. A schematic of the layout for pair imagers that have been flown to date. 


used prior to Fermi-LAT was trigger spark chambers. These require few readout 
channels and hence low power, are relatively easy to construct and can provide 
coverage of large areas at minimal cost. However, they require a pressure vessel 
to contain the gas, an external trigger, and are pulsed by a high voltage system 
containing a spark gap. In EGRET, the external trigger relied on a tiled time-of- 
flight system, which elongated the design and limited the field of view (this is left 
out of Fig. 5). Both the gas and the spark gap wear out and limit the lifetime, 
as well as requiring constant monitoring to track the degrading efficiency. By the 
time GLAST was proposed, solid-state particle detectors had become somewhat 
commonplace in particle physics and with the economy of volume, production costs 
were sufficiently low to allow the coverage of large areas. 

Of paramount importance in astronomy across all wavelength bands is to be 
able to tell where a signal is coming from. The task for the pair imager is to recon- 
struct the e* and e~ tracks and infer the direction of the incident gamma ray from 
them. The accuracy of this direction information is often referred to as the point 
spread function (PSF). The often-dominant contribution to the PSF, particularly 
at low energies, is caused by multiple scattering of the electron and positron as they 
traverse the detector material. Multiple scattering causes a particle’s trajectory to 
deviate stochastically and its source is the material through which the particle is 
penetrating. One can see an immediate tension here between the need to convert as 
many of the incident photons as possible (requiring more material) and minimizing 
the PSF. A formula to estimate the approximate size of the Gaussian core of the 
multiple scattering angular distributions is 


13.6 MeV x x 
— eee —<l P In | — 
Ao oa Zz =| + 0.038 n(=)}. 


where 99 is the multiple scattering angle projected in a plane (3D space angle = 
209), « is the path length traversed in the material with radiation length Xo, 
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and z is the charge of the particle with momentum p. Even for modest amounts of 
material the multiple scattering angle can be large when compared to Ogrp. For 
example, 50 MeV electrons, typical of 100 MeV incident gamma-ray conversions in 
0.025 radiation lengths, are scattered by ~52 mradians (~3°), which is much larger 
than OagEp (~1°) at this energy. 

The dominance of multiple scattering in determination of the PSF is replaced at 
high energy by the precision with which the track coordinates can be read out. This 
property of the particle detector is technology dependent. For example, in Fermi- 
LAT the detector technology is silicon strip detectors read out in a binary mode: 
“hit” or “not hit” (i.e. no pulse height information). The coordinate precision is 
determined by the strip pitch and in Fermi-LAT this is 228 wm, hence a Gaussian- 
equivalent o of ~ 66 wm. The transition from multiple-scattering-dominated PSF to 
measurement-precision-dominated PSF for the LAT occurs around 10 GeV. Perhaps 
of equal importance to the coordinate precision is the minimum separation that 
two tracks must have in order to be read out as two separate tracks. The finer the 
granularity (strip pitch in the case of SSDs), the closer together two tracks can be 
and be properly identified. At high energy, electron tracks develop nearby tracks 
from “knocked on” atomic electrons, and these knock-ons can have several MeV 
of energy and penetrate several layers. The presence of these “extra” hits around 
the electron trajectory can serve as a useful identifier for background rejection in 
that the track being followed is indeed an electron. However, they also corrupt the 
coordinate readout of the track when the two-track resolution is large. 

Gamma-ray conversions result in two charged tracks, which share the energy 
of the photon. At energies below ~1 GeV the energy split tends to be somewhat 
equal, but at higher energies, increasingly one of the final state particles takes a 
disproportionate share. In most current gamma-ray detector designs little or no 
provision is made to determine the individual particle energies; instead their sum is 
registered in a calorimeter. If the imager has a magnetic field, then the curvature of 
the tracks can be used to measure the energy split. However, the complications and 
limitations imposed by magnetic spectrometers make them unattractive to consider 
for space-based missions. 

So, what is given up by not measuring the e~ and e* energies separately? To 
explore this the output from a Monte Carlo simulation is used. Specifically, the 
4-vectors of the et and e~ are recorded and can be combined in various ways to 
estimate the incoming gamma-ray direction. The two 4-vectors are tagged as “high” 
and “low”, where “high” is associated with the higher energy track and “low” the 
other one. Given that in the real measurement the individual energies of the tracks 
are not well determined, and often only one track is reconstructed (particularly 
at high incoming energies), we consider the following estimates for the incident 
gamma-ray direction: 


(1) The direction of only the high-energy track. 
(2) The direction of only the low-energy track. 


Gamma-Ray Pair-Conversion Imaging Telescopes 85 


(3) Averaging the directions of the two tracks with equal weights. 

(4) Combining the two tracks, weighting the high-energy track 75% and the low- 
energy track 25%. 

(5) As in (3) except that the reconstruction mis-tagged “high” and “low”, resulting 
in weights of 75% for the low-energy track and 25% for the high-energy track. 


Using the above estimates, the deviation of the reconstructed direction from 
the incident gamma-ray direction is calculated. The deviations are fit to a simple 
power law in energy: 


100 MeV \" 
PSF (E) = PSF (100) (=) 


E 


where PSF (100) is the 68% containment angle at 100 MeV and TI is the power-law 
index. One would expect T to be close to 1, matching the smearing incurred from 
the kinematics of the underlying QED process. (For real data, which includes all 
effects, is considerably less than one: [ ~ 0.78 for Fermi-LAT and TI ~ 0.56 for 
EGRET.) Plots of such fits are shown in Fig. 6 for the high- and low- energy tracks 
individually (i.e. cases 1 and 2 from the list above). 

In Fig. 6 both the 68% containment and 95% containment data are plotted. 
The fit is done only to the 68% containment. The thin line is 3 x PSFg(E) and is 
shown so that it is easy to judge the size of the 95% containment data. The line 
connecting the 95% data points is just that. 

A similar analysis has been carried out for the other scenarios in the above list 
and the results are summarized in Table 2. 

In Table 2, PSF¢g(100) is in degrees and the last two columns give the ratio 
of the 95%-to-68% and 99%-to-68% containment angles. These ratios are approxi- 
mately independent of energy as can be seen in Fig. 6. The last row in Table 2 shows 
the effect of mis-reconstructing the two tracks; specifically, reconstructing a “track” 
that is one of the real trajectories in the X-projection, while the Y-projection comes 


4 PSF(100) = 0.68+-0.00713 J PSF(100) = 2.48+-0.0409 

5.004 E-Index = 1.01+-0.00326 5.004 re = 0.951+-0.00508 
‘eo Ratio 95/68 = 3.014+-0.134 | Ratio 95/68 = 6.39+-1.28 
A 4 Ratio 99/68 = 5.44+-0.552 a 4 5. Ratio 99/68 = 26.3+-11.2 
fF) pee . a oo) Low Energy 
RE: 4 genes & J & Track 
Z 0.05 4 _ % 0.054 
a4 a 
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1 2 3 4 3 1 2 3 4 35 
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Fig. 6. Fits to the PSF when using just the high-energy track (left) and the low-energy track 
(right). The lower line is the fit for the 68% containment angle. The thin line is 3x the 68% fit 
and the upper line shows the 95% containment derived from the data; the two agree well for the 
high-energy track. 
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Table 2. Various direction estimates for the incident gamma ray. 


Direction Est. PSF¢g(100) 95% /68% 99% /68% 
(1) Hi E Track 0.68 3.01 5.44 
(2) Low E Track 2.47 6.39 26.3 
(3) Equal Weights 0.86 8.36 35.8 
(4) 75:25 Hi:Low Weights 0.36 9.24 43.1 
(5) 25:75 Hi:Low Weights 1.67 6.94 29 

(6) X-Y Mixed Track 1.84 6.35 26.8 


from the other trajectory. This can occur in devices that cannot resolve X-Y ambi- 
guities in the read-out coordinates, such as Fermi-LAT. The 95% and 99% measures 
indicate that the PSF will have very long tails when one attempts to incorporate 
the information from the low-energy track. The statistical uncertainties are large 
for the 99%/68% ratio (>10%). 

Some of the “take-aways” from this exercise are: 


(1) The safe bet is to simply use the high-energy track. However, this may not be 
so simple in that the pair imager will have to determine which is the higher 
energy track. 

(2) The seemingly large 95%/68% and 99%/68% ratios for several of the cases is 
driven to a large extent by a small PSF¢g(100). For example, case 4, which 
“vertexes” the two tracks (correctly identifying the High and Low tracks), has 
a very small PSF¢g(100) and the large 99% /68% reflects this. 

(3) It is clear that when an error is made in the assignment of High vs. Low or 
tracking confusion occurs (as in item 6) serious degradation in direction accu- 
racy occurs. 


These effects are usually small compared with those induced by multiple scat- 
tering. In Fig. 7, the curve for multiple scattering for 100 MeV incident gamma 
rays is shown (assuming 50 MeV daughter particles) as a function of the amount of 
material intersecting the track in radiation lengths. 

Also indicated in Fig. 7 are the values of PSF¢g(100) in degrees for the as-built 
Fermi-LAT (front section with 3% converters; labeled LET-Front) and what it would 
have been if the converters were absent and the photons converted (mostly) in the 
Silicon Strip Detectors (SSDs; labeled LET-Si only). These lie somewhat above the 
QED opening angle Bench Mark point, 9gEp (only gas-based detectors can perform 
better than OgEp). The arrow in Fig. 7 labeled “Simple Vtx” corresponds to case 4 
in Table 2. So to get close to the nuclear recoil limit, the detector must be gas-based 
in order to reduce the multiple scattering, and it must reconstruct both the High 
and Low energy tracks and properly identify them as such. 

The last item to discuss is the trade-off between efficiency in converting incom- 
ing gamma rays and the PSF. The sensitivity to finding point sources is dependent 
on the angular density of photons around the location of the source. The area on 
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Fig. 7. Multiple scattering, which smears the track’s direction, vs. material thickness in radiation 
lengths for 100 MeV photons. The plot shows the lo angular width of the distribution in degrees. 
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the sky covered by these gamma rays is «x PSF? and hence is proportional to the 
radiation lengths in the imager in the limit where multiple scattering dominates. 
At high energy the resolution asymptotically approaches a value dependent on the 
coordinate resolution of the charged particle detectors (the strip pitch in the case 
of SSDs). The number of converted gamma rays also goes as the radiation lengths; 
hence the photon surface density is independent of how much material is in the 
imager. This overly simple estimate does not account for the fact that most sources 
are not completely isolated in the sky. Those located near the galactic plane lie on 
top of a very large background of diffuse gamma-ray radiation from the interaction 
of cosmic rays with galactic dust and gas. This argues for trading effective area for 
a better PSF. On the other hand, the gamma-ray sky has been revealed to be rich 
in transient phenomena. The ability to observe the onset of an AGN flare or a GRB 
depends on the density of converted gamma rays in time. This argues for thicker 
converters at the expense of the PSF. In the end, the science drivers for the mission 
will determine how this trade-off plays out. 


2.3. The Calorimeter 


The pair imager discussed above needs to be supplemented with an energy measure- 
ment for each photon. It is easy to appreciate this, given that the PSF is a strong 
function of energy due to both QED and multiple scattering. Without a good energy 
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measurement for each event, it is not possible to weight events appropriately when 
locating sources. Simply put the energy determines the PSF for each event. 

The role of the calorimeter is to measure the total energy of the incoming 
gamma ray. When the missions prior to Fermi were conceived, the wisdom was that 
one should totally absorb the electromagnetic shower following the conversion to 
obtain good energy resolution. A conflict arises here since to fully or approximately 
contain EM showers at several to hundreds of GeV requires 20 or more radiation 
lengths of material, and this makes for a massive device requiring a very large and 
costly rocket. 

The alternative is to make an imaging calorimeter, which records the details of 
the shower development. The EM shower builds up exponentially until the showering 
particles mainly loose energy from ionization rather than bremsstrahlung and pair 
production. The depth at which this occurs is called shower maximum (tmax) and 
is where the energy deposited per unit length is maximum. A formula for tyax is 


Fo 
tmax = 1 I a ) 


where tmax is in radiation lengths, Ep is the incident energy, and E, is the critical 
energy (below which the energy losses are dominated by ionization rather than 
bremsstrahlung and pair production). C is a constant that is —0.5 for incident 
electrons and +0.5 for incident photons. 

A functional form describing the shower development is given by 


a-1l _ 
dE _ 5p) e - 
dt T(a) 
where ¢ is the depth in radiation lengths, b ~ 0.5 (depending very slowly on energy 
and material type), and a is related to tmax by tmax = (a — 1)/b. 

Figure 8 is a plot of the shower development for a 10-GeV incident photon. 
For calorimeters that do not contain the entire shower, energy is lost by photons 
escaping out the back. If the absorption depth of the calorimeter is only 10 radiation 
lengths, all energy deposited at greater than 10 radiation lengths is lost and is called 
“leakage”. This leakage can fluctuate, causing degradation in the resolution of the 
energy measurement. The simple explanation of this is that the location of the initial 
conversion, and hence the start of the shower, can vary by >2 radiation lengths from 
photon to photon. The effect is as if the curve in Fig. 8 is shifted back and forth, as 
shown by the dotted lines. The shifting back and forth of the shower profile causes 
the leakage to vary accordingly. 

If the calorimeter is segmented such that it produces an image of the shower 
energy deposition, then by fitting this functional form to the measurement a cor- 
rection can be made for the leakage fluctuations. In short, the energy resolution 
is driven by the accuracy with which one fits the measured shower data to the 
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Fig. 8. The longitudinal shower profile of incident 10GeV photons. Shower maximum occurs 
around seven radiation lengths. 
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Fig. 9. Data from the CERN Beam test for the LAT calorimeter. The hashed histograms show 
the as-measured energy deposition without leakage compensation. The gray shaded histograms are 
the same data after a shower shape fit is done, resulting in leakage compensation event-by-event. 
The resolutions quoted in the figure are from a Gaussian fit to the peak as shown by the black 
curves. From Ref. 10. 


shower profile. The results of this event-by-event compensation for shower leakage 
are shown in Fig. 9 below (from Ref. 10). 

A detailed analysis of correcting for leakage in the case of Fermi-LAT can be 
found in Ref. 13. 


2.4. Background Rejection 


A major challenge for gamma-ray observatories is to be able to distinguish celestial 
gamma rays from the flood of cosmic rays entering the instrument. This incoming 
cosmic-ray flux is limited by the Earth’s magnetic field, resulting in a “geomag- 
netic” cutoff at low energies. This cutoff is dependent on the magnetic latitude and 
decreases as you go away from the magnetic equator. For orbits inclined with respect 
to the equator (e.g. 28° as is typical for US launches from the Kennedy Space Center 
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Fig. 10. Orbit-averaged background fluxes of the various components used in the LAT background 
model. The fluxes are shown as a function of total kinetic energy of the particles: protons (solid 
black line), He (dashed black line), electrons (solid gray line), and positrons (dashed gray line). 
The effect of geomagnetic cutoff is seen at ~3 GeV for protons and electrons, and at higher energy 
for helium nuclei. At low energies the curves show the sum of re-entrant and splash albedo for 
electrons and positrons. Left out of the plot are neutrons and photons, which come from cosmic ray 
interactions in the Earth’s atmosphere, as do the other particles types below geomagnetic cutoff. 
These components are similar or smaller than the electrons. 


in Florida), the cutoff varies by a few GeV but averages around 3 GeV. Particles of 
lower energy are secondaries from cosmic ray interactions in the Earth’s atmosphere, 
some of which are trapped by the Earth’s magnetic field. This low energy flux is 
dominated by e* and e~. A plot of the rates is shown in Fig. 10. 

The fluxes are enormous when compared to the celestial gamma-ray flux, which 
when integrated above 100 MeV is 1-2 m~?s~!. To do science in this environment 
one is faced with rejecting these backgrounds to better than one part in 10°. 

All of the gamma-ray observatories flown to date have employed a veto system to 
inhibit entering charged particles from triggering the data acquisition and to assist 
in rejecting non-gamma-ray events in the offline reconstruction analysis. The veto 
system has to be outside of as much material as possible; otherwise, the incoming 
flux will create secondary photons, which are indistinguishable from the signal. 
Being outside “everything” also means it will have to cover a large area. Plastic 
scintillators read out by photomultiplier tubes have been used in all observatories 
to date. 

There is however a subtlety realized after the fact from the EGRET mission: 
high-energy gamma rays upon showering in the calorimeter create a quasi-isotropic 
flood of X-rays, which can “fire” the veto system. This so-called self-veto effect 


Gamma-Ray Pair-Conversion Imaging Telescopes 91 


limited EGRET’s high-energy reach to ~10GeV. EGRET’s veto “dome” was a 
monolithic scintillator and this design choice substantially contributed to the prob- 
lem. In Fermi-LAT the veto system (or Anti-Coincident-Detector (ACD)) was seg- 
mented into 89 separate tiles. By doing this, only a limited number of tiles could 
act as a veto and this pushes the energy at which self-veto becomes a problem to 
higher energies. 


3. Planar Instruments 


All recent flight instruments have been planar devices. The paradigm for these 
imagers is planes of high Z converter material interspersed with particle tracking 
detectors. The thickness of the converters is adjusted according to the aforemen- 
tioned trade-off between effective area and PSF. The detectors and support material 
are constructed so as to present as few “extra” radiation lengths as possible. And 
then there are significant geometric considerations as well: minimizing gaps and 
cracks and optimizing the layout to limit the effects of multiple scattering. 

A sketch of a layer in the Fermi-LAT imager (or “Tracker”) is shown in Fig. 11. 
The support structures are aluminum honeycomb panels with carbon fiber face- 
sheets. This provided adequate strength to endure the rocket launch while adding 
only 0.005 radiation lengths per plane to the material audit. The particle detectors 
are silicon strip detectors, commonly used in particle physics experiments since the 
1980s. They are essentially 100% efficient with very low noise (< 10~° noise count 
rate per channel), they operate at relatively low voltages (~100 V bias), and they 
have no consumables such as gas or cryogenics. They are quite radiation hard and 
hence have a very long life in low Earth orbit. 

For Fermi-LAT the trade-off between PSF and effective area in the choice of 
converter thickness resulted in a two-section imager. The first 12 layers have thin 
converter foils (0.03 radiation lengths) followed by four thick layers with 0.18 radi- 
ation length converter foils. Since the silicon detectors were to provide a critical 
trigger signal, two layers without converter foils followed the 16 layers with con- 
verters, providing a stack of 18 tracking layers. This resulted in a device in which 
approximately 1/3 of the incoming gamma rays convert in the thin section and 1/3 
in the thick section. The remaining gamma rays pass through and shower in the 
calorimeter. 

The overall layout for Fermi-LAT is a 4 x 4 array of Trackers as described 
above. Each tracker unit is followed by a calorimeter unit and associated electronics 
for readout. These tracker and calorimeter pairs form a “Tower”. This segmented 
the aperture into 16 units and helped ameliorate issues associated with monolithic 
designs such as EGRET. In addition, the Tower segmentation allowed for a highly 
parallel event readout, which was coordinated by a central data acquisition unit. 
The trigger rate for Fermi-LAT averages around 2KHz, but can increase to >4 KHz 
when far off the equator. The dead time associated with this readout rate in the 
LAT is typically around 9%. 
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Fig. 11. Sketch of one layer of a Tracker, illustrating the design principles. The Tracker consists 
of 16 layers (“trays”) like that shown here, plus two additional layers without W converters. The 
first two measured track positions, from the top two Si strip detectors, dominate the measurement 
of the photon direction, especially at low energy. 09 is the multiple scattering angle associated 
with a single converter foil. (a) Ideal conversion half-way through the Tungsten (W) converter. Si 
detectors are located as close as possible to the converter, to minimize the lever arm for multiple 
scattering. Therefore, scattering in the second W layer has little impact on the measurement. 
(b) Fine detector segmentation can separately detect the two particles close together in many 
cases, enhancing both the PSF and the background rejection. (c) Converter foils cover only the 
active area of the Si, to minimize conversions for which a close-by measurement is not possible. 


One of the issues with planar devices such as Fermi-LAT is that there is a depen- 
dence of the instrument response functions (effective area, PSF, energy resolution, 
etc.) with the angle of incidence of the gamma ray with respect to the instrument 
axis. As the angle of incidence increases, the apparent thickness of the converter 
foils increases as 1/cos(@), the distance between measuring planes increases simi- 
larly, the apparent granularity of the detectors decreases, etc. In general most things 
get worse, the exception being the energy resolution, which benefits from the longer 
path length through the calorimeter. For the LAT the field of view extends to ~ 80° 
off axis (at 90° off axis planar devices “look” like open Venetian blinds). 

Some of the planar designs currently being developed are GAMMA-400"* and 
Gamma-Light.!° 


4. Active Volume Detectors 


In planar instruments, the place where most of the incident gamma rays convert is in 
passive material. If it can be arranged that the conversion happens in active mate- 
rial, many performance parameters can be significantly improved. These include 
everything from background rejection to PSF. The most explored technology to 
achieve this is time projection chambers (or TPCs).1° The detector is a large vol- 
ume of gas (usually Ar) with a carefully controlled parallel electric field. When 
a charged particle traverses the gas, it leaves an ionization trail in its wake. The 
electrons and gas ions subsequently drift in opposite directions along the electric 
field lines and the electrons are collected in gas-amplification devices at the end of 
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the drift volume. The location of the collection pads on the end plate determines 
two spatial coordinates and the drift time gives the third, hence providing a 3D 
readout of the track. This technology was developed in the 1970s and 1980s and 
several high-energy particle physics detectors were based on it, such as the ALEPH 
detector at the LEP storage ring at CERN.1” 

In order to gain a large acceptance, most prototype TPC designs have long 
drift distances, which presents a problem. As the electrons drift to the endplate 
detectors, diffusion smears them out. This in turn compromises the accuracy of 
the readout coordinates. In practice this limits drifts to only some tens of cm. In 
the large particle physics detector applications, providing a very strong magnetic 
field to the active volume, parallel to the electric field, minimizes this effect. The 
electrons then spiral around the field lines and diffusion is curtailed. The problems 
associated with launching and maintaining a large magnet in space are daunting, 
however. 

Another solution proposed by Hunter? is to dope the gas with small amounts 
of electronegative molecules. The electrons quickly become attached to these gas 
molecules and due to their mass, diffusion is minimized. The issue with this solution 
is that the drift times are increased by several orders of magnitude. This in turn will 
result in many “events” occurring during the drift time. Whether or not the TPC 
can provide sufficient pattern recognition to sort out the real gamma-ray signal in 
the presence of background particles is an open question. 

With all the TPC designs there is another issue that needs attention. In all 
missions to date, an effort was made to minimize the passive material outside the 
cosmic-ray veto system. This material presents a target for cosmic rays and the 
resulting secondaries will have a substantial number of gamma rays, mostly from 79 
decays, and these are indistinguishable from celestial gamma rays. Both gas-based 
and liquid-based TPCs will have a lot of external material, either in the form of a 
cryostat or a pressure vessel. 

Placing the veto system outside of these structures is problematic as well. The 
TPC works by converting one coordinate into a drift time, and the drift times are 
long compared to the typical time between the passages of cosmic rays through 
the instrument. Pairing a veto strike with a reconstructed gamma-ray event will be 
very difficult. Furthermore veto strikes over a large fraction of the veto system will 
have to be considered. The overall ACD rate in Fermi-LAT is ~80kHz (this rate 
corresponds to ~2 Hz/cm?s). 

A partial list of TPC designs currently being pursed includes AdEPT,? 
HARPO,!® and LArGO.!9 


5. Summary 


The end of the Fermi Mission will happen sooner or later and various groups 
are actively developing next-generation instruments. With the ground-based air 
Cherenkov detectors pushing into the tens of GeV domain and the costs associated 
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with launching devices with large effective areas, much of the focus is on energies 
between ~0.1 MeV and 100 MeV. Here the incoming flux is high enough to afford 
devices with modest effective area access to interesting science such as polarization 
measurements. The trade is obviously giving up effective area for PSF resolution. 
Gas-based devices offer the hope of pushing the resolution below @gEp but these 
device concepts present problems, which have yet to be resolved. 
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The analysis of the pair production data is unique in astronomy and is critically 
linked with the details of the instrumental hardware. This chapter describes 
several aspects of event reconstruction and source analysis in pair-conversion 
telescopes. This includes the methods used to reconstruct the details of individ- 
ual y-ray interactions; the parametric representations of instrument performance; 
and the likelihood fitting-based analysis techniques used to analyze astrophysical 
sources from y-ray event lists. Finally, we describe astrophysical data sets that 
can be used for calibration and validation of the instrument performance. 


1. Introduction 


As discussed in Chapter 4 of this volume, in pair-conversion telescopes individual 
-rays convert to ete~ pairs, which are recorded by the instrument.!* By recon- 
structing the ete~ pair we can deduce the energy and direction of the incident y-ray. 
Accordingly, data analysis is event-based: we record and analyze each incident par- 
ticle separately. Each event consists of a readout of all of the signals deposited in the 
instrument during a narrow time window, typically O(us).* It is worth noting that 
cosmic-ray rates are high enough that some fraction of events will contain signals 
from more than one cosmic ray, or from both a y-ray and a cosmic ray. 

The field of view of pair-conversion telescopes is huge compared to most obser- 
vatories operating at other wavelengths, typically exceeding a steradian (e.g. the 


“Detector technologies are capable of much finer time resolution, down to O(ns) timescales. How- 
ever, the power budget available for space-based telescopes significantly limits the achievable time 
granularity. 
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Fermi-LAT field of view is ~2.4sr at 10 GeV). Pair-conversion telescopes are nor- 
mally deployed in low-earth orbit with a period of ~90 minutes; coupled with the 
large field of view, this makes them excellent sky monitors. In fact, rather than 
pointing at a series of fixed targets the Fermi-LAT is usually operated in a sky- 
survey mode where it continuously scans across the sky. 

In this chapter we will first describe some of the methods used to identify 
the incoming y-rays and estimate their energy and direction (Sec. 2), discuss the 
instrument performance using the Fermi-LAT as an example (Sec. 3), describe tech- 
niques used to make astronomical measurements from lists of y-ray events (Sec. 4) 
and finally we will touch briefly on astronomical data samples used to calibrate and 
validate the instrumental response (Sec. 5). 


2. Reconstruction of Pair-Conversion Events 


Event reconstruction translates the raw event information from the instrument sub- 
systems into a high-level event description; see Fig. 1 for an illustrative event display 
of a y-ray in the Fermi-LAT.*4 

Reconstructing the signals in the individual detector channels into a coherent 
picture of a particle interaction for each of the several hundred events collected 
every second by a pair-conversion telescope is a formidable task. The basic steps of 
the process are: 


(1) Digitization: converting the information about signals in individual channels 
from the schema used in the electronics readout to more physically motivated 
schema, including the location of the signal in an instrument-based coordinate 
system. 

(2) Event Reconstruction: applying pattern recognition and fitting algorithms com- 
monly used in high-energy particle physics experiments to reconstruct the event 
in terms of individual tracks and energy clusters in the detector subsystems and 
to associate those objects with each other. This step includes separating signals 
the different particles in multi-particle events. 

(3) Event analysis: evaluating quantities that can be used as figures of merit for the 
event from the collections of tracks, clusters and associated information. Once 
this information is extracted, multivariate analysis techniques are applied to 
extract measurements of the energy and direction of the event and to construct 
estimators that the event is in fact a y-ray interaction rather than a background 
cosmic-ray interaction. 

(4) Event classification: applying selection criteria to make lists of y-ray events. 


While the digitization stage is fairly straightforward, the remaining steps are 
involved and described further below. 
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Fig. 1. Event display of a simulated 27 GeV y-ray (a) and zoom over the calorimeter (b) and 
tracker (c) portions of the event. The small crosses represent the clusters in the tracker (see 
Sec. 2.1), while the variable-size squares indicate the reconstructed location and magnitude of 
the energy deposition for every hit crystal in the calorimeter. The dotted line represents the true 
y-ray direction, the solid line is the reconstructed calorimeter cluster axis and the dashed lines 
are the reconstructed tracks. The backsplash from the calorimeter generates tens of hits in the 
tracker, with two spurious tracks reconstructed in addition to the two associated with the y-ray 
(note that they extrapolate away from the calorimeter cluster centroid and do not match the 
calorimeter cluster axis direction). It also generates a few hits in the anti-coincidence detector, 
which, however, are away from the event direction extrapolation and therefore do not contradict 
the classification of the event as a y-ray. 


2.1. Event Reconstruction 


A wealth of techniques have been developed in high-energy particle physics to recon- 
struct the momenta and energy of particles observed in a detector (see, e.g. Ref. 5 
for an overview of the subject). Furthermore, the specific details of the algorithms 
depend on the geometry of the detector in question. Therefore, rather than discuss 
these algorithms in detail we will summarize the overall process. 

In principle, near the pair-conversion point the event consists only of the two 
tracks from electron and positron. (At energies above a few GeV, depending on the 
granularity of the tracker, the two tracks overlap and become indistinguishable.) 
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Within about one radiation length along the direction of travel of the incident y-ray, 
bremsstrahlung emission and secondary pair conversions produce the cascade of e*, 
e and y-rays making the EM shower. As discussed in Chapter 4 of this volume, 
identifying and correctly reconstructing the et and e~ tracks before the onset of 
the EM shower is an essential goal of the event reconstruction. 

The “clustering” stage of event reconstruction is to combine signals in adjacent 
detector elements that are likely to have been caused by the same incident particle. 
In the Silicon strip detectors used in the Fermi-LAT and the AGILE-GRID, a 
charged particle often leave signals in two or more adjacent strips in a single layer. 
The clustering algorithms combine these signals into a set of clusters that will be 
used in the later stages of the reconstruction. 

Clustering algorithms for signals in calorimeters are different; since the EM 
showers are fully developed in the calorimeter the goal of the calorimeter clustering 
algorithm is to gather together all of the signals from a single shower, rather than 
from the individual tracks in the shower. Calorimeter clustering is important to 
disentangle the signals caused by incoming y-rays from those caused by cosmic rays 
that crossed the instrument during the event readout window. The clustering stages 
often include the application of channel-by-channel calibration algorithms, so as to 
measure the energy deposition in each cluster. 

With highly segmented calorimeters, such as the Fermi-LAT calorimeter, and 
to lesser extent, the AGILE Mini-Calorimeter, a dedicated “shower-fitting” algo- 
rithm (see, e.g. Refs. 6 and 7) can extract information such as: estimates of the 
total energy deposited in the cluster as well as the energy that leaked out of the 
calorimeter, topology of the clusters, e.g. the axes of the calorimeter clusters and 
the transverse and longitudinal extent of the showers with respect to those axes. 
Topological information is useful for discriminating between y-rays and cosmic rays, 
in particular baryons such as protons and heavy nuclei. Baryons do not generate 
electromagnetic (EM) showers, but rather deposit energy by a combination of ion- 
ization and nuclear interactions, resulting in significantly different shower topologies 
than y-rays, electrons or positrons. 

In the “track-finding” stage of the event analysis, clusters in the tracker are 
linked together into tracks representing the path of individual particles. Far too 
many track-finding algorithms exist to be discussed here; four different algorithms 
are used in the reconstruction of Fermi-LAT data alone. Probably the most impor- 
tant points to make are that no single track-finding algorithm is a panacea, and 
different algorithms have often have complementary strengths. For example, one 
algorithm may be very fast, and excellent for picking out the simple and straight 
tracks often left by high-energy cosmic rays, while another may be significantly 
slower, but better at disentangling the more complex events caused by y-rays with 
their EM showers. 

Once individual tracks have been found, Kalman filter-based “track-fitting” 
algorithms are used to extract estimates of the direction of the individual 
particles.*-!° The Kalman filter technique successively updates the best-estimate 
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track parameters and uncertainty estimates to account for the information gained 
by each hit along the track trajectory and the information lost as the particles expe- 
riences multiple scattering in the detector material. The output of the Kalman-filter 
track fit is a series of best-fit track parameters and associated covariance matrices, 
one set for each signal observed or volume of material traversed by the tracked 
particle. Each set of parameters gives the best-fit estimate of the track trajectory, 
using information up to that point in the track. In practical terms, however, the 
track parameters at the beginning of the track are the most important, as they can 
be used both to determine the direction of the incoming particle and to propagate 
the track to test for the existence of associated signals in other detector sub-systems 
(e.g. an associated hit in an anti-coincidence veto system that might indicate the 
track was most likely caused by a charged particle starting outside the tracking 
volume). 

The final stage of the event reconstruction is to associate information from 
the various instrument sub-systems. For example, tracks can be extrapolated 
to the calorimeter and matched up with calorimeter clusters, or extrapolated to a 
surrounding anti-coincidence veto detector and associated with signals there. 

In fact, the description given here is something of an oversimplification. Infor- 
mation from various stages of event reconstruction can be useful in other stages. For 
example, for very complicated events, typically with thousands of individual signals 
in the tracker, the axis of the EM shower in the calorimeter can be used to seed 
the track-finding algorithms, to reduce the combinatorial challenge to a manageable 
level. Typically the later stages of the event reconstruction have some amount of 
overlap, and part or all of the process might be iterated more than once, often using 
information from previous iterations. Also, the tracks and calorimeter clusters from 
the different particles in multi-particle events must be separated during the event 
reconstruction. 


2.2. Event Analysis 


Once each event has been fully reconstructed, the reconstructed event must be 
analyzed to extract information that will be used to determine the energy, direction 
and species of the incoming particle. It is also useful to estimate how well the 
energy and direction were measured, e.g. to select sub-samples of events that are 
particularly well reconstructed, or to reject events that are poorly reconstructed. 

It is possible to design literally hundreds of different quantities to characterize 
each event. Here we will list the most important questions these quantities are 
designed to answer. 


(1) Which track or tracks came from the original ete~ pair? In practice this is a 
matter of finding the longest and straightest tracks, as those are the ones from 
the highest energy particles. These are most likely to be from the original pair, 
as opposed to back-scattered particles from the EM shower. Furthermore, if 
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the two longest and straightest tracks from a vertex, i.e. if they come from a 
single point, that is a strong indicator that the event is in fact from a y-ray pair 
conversion. 

How accurate are the estimates of the energy and direction? Events for which a 
large fraction of the event energy leaked out of the calorimeter naturally have 
poorer energy resolution than events for which the energy was well-contained. 
Likewise, high-energy events for which the pair conversion occurred near the top 
of the tracker and signals were observed in every tracker plane naturally provide 
a more accurate direction estimate than events for which the pair conversion 
happened at the bottom of the track and only three or four track planes have 
signals. 

Does the event start well inside the instrument? Incoming cosmic rays will 
leave signals from the point at which they enter the instrument; for converted 
ete” pairs the signals will only start after the conversion point. If a likely 
conversion point can be identified, quantifying the number of detector signals 
“missing” before that point along the track extrapolation is a powerful discrim- 
inator between y-rays and cosmic rays. 

Is the topology of the event consistent with an EM shower? “Shower-shape 
variables” distinguish between EM showers caused by 7y-rays and the hadronic 
showers caused by cosmic-rays proton and heavy ions. However these do not 
discriminate against cosmic-ray electrons and positrons. Similarly, the specific 
ionization, i.e. the energy deposition per unit length in the instrument, scales 
as the square of the particle charge, making it a powerful discriminator against 
heavy ions. On the other hand, because of bremsstrahlung processes, EM shower 
events tend to have a larger number of extra signals in the tracker near the ete~ 
pair. 

Do the instrument sub-systems provide a consistent picture of the event? 
Because of the huge level of background rejection needed, low-probability events 
can pose a significant analysis challenge. This is especially true as these low- 
probability events might partially mimic a y-ray event. A specific example would 
be a cosmic-ray electron that enters the side of the calorimeter, creating an EM 
shower, from which a single particle escapes into the tracker and leaves a short 
track before running out of energy. The tracker reconstruction might misidentify 
the event as a track that started in the middle of the tracker. The calorimeter 
reconstruction might well identify the event as an EM shower. However, the 
axis of the calorimeter shower would probably not match the track direction 
particularly well. Only by putting together the entire event would it be identified 
as a likely cosmic-ray interaction. 


Event Classification 


Since pair-conversion telescopes are subjected to a flux of cosmic-rays orders of 
magnitude larger than the y-ray flux, it is not feasible to design a single set 
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of event selection criteria that remove all the cosmic-ray backgrounds while pre- 
serving efficiency for selecting the y-rays. In fact, optimization with sophisticated 
machine learning algorithms is required to reach the signal-to-noise levels needed for 
astronomy. 

It is also important to note that the optimal point of the background rejection 
versus signal efficiency curve (also known as the receiver operation characteristic 
or ROC curve) depends on the scientific measurement being made. Accordingly, 
instrument teams often provide a small number of different sets of y-ray event lists, 
made with more or less stringent background-rejection criteria. For example in each 
of their data releases the Fermi-LAT collaboration has provided separate selections 
optimized for the study of: (i) short timescale transients such as gamma-ray bursts, 
(ii) persistent point sources, (iii) large scale diffuse emission and (iv) all-sky studies 
of the isotropic gamma-ray background.” Similarly, the AGILE collaboration pro- 
vides one data selection for most analyses and a looser selection for the analysis of 
gamma-ray bursts and pulsar timing.” 


3. Instrument Response Functions 


The standard for analysis of pair-conversion telescope data is the maximum like- 
lihood analysis formalism as described in Ref. 11. A critical component of this 
formalism is the parametric representation of instrument performance: the instru- 
ment response functions (IRFs). In practice, analyses assume that the IRF's can be 
factorized into three parts: 


(1) Effective Area, Acg(E,6,s), the product of the cross-sectional geometrical col- 
lection area, y-ray conversion probability, and the efficiency of a given event 
selection (denoted by s) for a y-ray with energy E and direction 6, in the 
instrument frame; 

(2) Point-spread Function (PSF), P(t’; E,0,s), the probability density to recon- 
struct an incident direction 6’ for a y-ray with (E£,é) in the event selection s; 

(3) Energy Dispersion, D(E’; E,é,s), the probability density to measure an event 
energy E’ for a y-ray with (£,%) in the event selection s. 


Note that the IRFs can change markedly across the instrument field of view. In 
practice, because of their high level of azimuthal symmetry, existing pair-conversion 
telescopes have parameterized the IRFs in terms of the incidence angle with respect 
to the bore-sight, 0, and either ignored the azimuthal dependence of the IRFs or 
treated it as a small correction factor. 

Generally, IRFs are created by making Monte Carlo simulations of millions 
of y-rays incident from a variety of directions, locations and energies, fitting the 


>See, e.g. http://fermi.gsfc.nasa.go7/ssc/data/analysis/documentation/Cicerone/Cicerone_Data_ 
Exploraticn/Data_preparation.html 
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resulting distributions to parameterized functions at each energy and incidence 
angle, and storing tables of the fit parameters. 

It is useful to define the IRFs in the instrument frame when discussing instru- 
ment performance; but when analyzing celestial y-ray data it is more practical to 
work in sky coordinates, and we must therefore account for the pointing history of 
the instrument. So we denote the IRF in celestial coordinates, p and in terms of a 
generalized vector L(t) describing, as a function of time, the instrument attitude and 
other relevant degrees of freedom, such as the instrument mode, or if the instrument 
is even taking data (“live”). The IRFs are then: 


Aert = Aor (E, p, L(t), 8); 
P = P(f’; E, 8, L(t), 8); 
D = D(E';E,p, L(t), s). (1) 


Clearly, this implies that the “live”-time pointing history of the instrument must 
be tracked and saved. 

Because the instrument response has to be evaluated over the constantly chang- 
ing orbit and attitude of the telescope, experience has shown that it is useful to pre- 
compute and store the distribution of observing time in the instrument reference 
frame of any given direction in the sky. For a single direction this is referred to as 
the “observing profile” or “off-axis histograms”, and written top,(6;p); and we refer 
to a collection of observing profiles for a pixelization of the entire sky as a “livetime 
cube”. 

The exposure in a given direction can be calculated by integrating the product 
of the observing profile and the effective area over the instrument field of view: 


£(B,p,8) = f doAu(E,0, )tovs(60) (2) 


Figure 2 shows examples of the observing profile for two different sky locations and 
a map of the exposure over the entire sky at 1 GeV. 


3.1. Effective Area 


When describing the performance of a pair-conversion telescope, we commonly show 
the effective area at normal incidence as a function of the energy and the angular 
dependence of the effective area for a given energy. Figure 3 shows these performance 
curves for the Fermi-LAT. 

In the middle of the energy band, the effective area is primarily determined by 
the geometrical cross-section of the instrument and the pair-conversion efficiency, 
and is fairly uniform with energy. At low energies the effective area falls off because 
the converted electron and positron may not have enough energy to leave a long 
enough track to be properly reconstructed. At high energies the effective area falls 
off because some fraction of the huge number of particles created in the EM shower 
in the calorimeter scatter back into the tracking volume and complicate the event 
to the point that it becomes exceedingly difficult to properly reconstruct. 
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Fig. 2. Observing profiles and exposure at 1 GeV for 5 years (2008 Aug. 03 to 2013 Aug. 03) of 
Fermi-LAT observations. Left: observing profiles towards the Vela pulsar and the blazar 3C279. 
Right: exposure at 1 GeV in cm?s for the P7REP_SOURCE_V15 IRFs; shown in a Hammer-Aitoff 
projection in equatorial coordinates along with the locations of Vela and 3C 279. The difference 
in observing profile between the two sources is a consequence of the different declinations (6) of 
those sources. (See electronic edition for a color version of this figure.) 
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Fig. 3. Effective area and acceptance of the Fermi-LAT, for the P7REP_SOURCE version of the event 
selections and the associated P7REP_SOURCE_V15 IRF's. Top left: Agg for y-rays entering normal to 
the orientation of the tracker planes. Top right: comparison of the Agg of the P7REP_SOURCE event 
selection and two other selections. Bottom left: variation of the Agg with incidence angle (0) for 
10 GeV y-rays. Bottom right: acceptance, Eq. (3), as a function of energy. “Front” and “back” 
refer to events that convert in the front (first 12 converter/sensor layers) and back (last 6 layers) 
sections of the LAT tracker. The Tungsten converters in the back layers are thicker and induce 
more y-ray conversions, resulting in roughly the same total acceptance as the front section with 
fewer converter layers. (See electronic edition for a color version of this figure.) 
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The incidence angle dependence of the effective area is primarily a geometric 
effect; as the incidence angle increases, the chance of a particle leaving enough 
signals in both the tracker and calorimeter such that both the direction and energy 
can be reconstructed decreases. The variation of the effective area with incidence 
angle would be quite different for instruments that did not have the stacked planar 
geometry of existing pair-conversion telescopes. 

When discussing instrument performance, it is often useful to integrate over the 
field-of-view to define the “acceptance” of the instrument: 


A(E,s) = jf Aa(ee. 8). (3) 
The acceptance is plotted in the lower-right panel of Fig. 3. 


3.2. Point-Spread Function 


As discussed in Chapter 4 of this volume, at low energies the PSF of a pair- 
conversion telescope is determined by multiple scattering, while at high energies 
it is determined by the spatial resolution of the individual signals in the tracker 
and the distance between successive signals. Furthermore, the increasing complex- 
ity of higher energy events makes it more likely that the track-finding algorithms 
will associate one or more signals with the wrong track. In general this results in 
widening the tails of the PSF without much change to the core of the distribution. 
Figure 4 shows two measures of the PSF as a function of energy for the Fermi-LAT. 

For instruments with planar geometries such as the Fermi-LAT or the AGILE- 
GRID the PSF varies with the incidence angle of the incoming y-ray; at larger 
incidence angles the particles will cross more material (and experience more multiple 
scattering) per tracking plane. For a particular direction in the sky, the time- and 
observing profile-averaged effective PSF is 


, _ [ dP(p'; E,p,0, 8) Aor (E, 6, 8)tops(0; 8) 


Ay, a 
P(p : Eps) 7 J di Aca(E, 6, 8)tons (8; p) - 


3.3. Energy Dispersion 


As discussed in Chapter 4 of this volume, the energy resolution of existing pair- 
conversion telescopes is determined by how much of the energy of the incident 
y-ray is deposited in the calorimeter. At low energies the energy resolution degrades 
because a large fraction of the energy is typically lost in the tracker, before the 
particles reach the calorimeter; at very high energies the energy resolution degrades 
because of energy leakage out the back and sides of the calorimeter. Figure 5 shows 
the energy resolution® of the Fermi-LAT as function of energy and incidence angle. 


°Defined as the half width of the energy window containing 34% + 34% (i.e. 68%) of the energy 
dispersion on both sides of its most probable value, which typically gives slightly larger values of 
energy resolution than using the smallest 68% containment window. 
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Fig. 4. Containment radius of the point-spread function of the Fermi-LAT for the 
P7REP_SOURCE_V1i5 IRF's as a function of energy for y-rays entering at normal incidence. Left: 
68% and 95% containment radii. Right: ratio of the 95% to 68% containment radii. The expected 
ratio for a two-dimensional Gaussian is 1.61; i.e. at high energies the tails of the PSF are very non- 
Gaussian. The thicker Tungsten converters in the back section result in more multiple scattering of 
the tracks and worse direction resolution. (See electronic edition for a color version of this figure.) 
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Fig. 5. Energy resolution of Fermi-LAT as a function of energy and incidence angle. Left: energy 
resolution as of function of energy for y-rays arriving at normal incidence. Right: energy resolution 
for 10 GeV 4y-rays as a function incidence angle. (See electronic edition for a color version of this 
figure.) 


For specific analyses where the energy resolution is important, it can be useful to 
define time- and observing profile-averaged effective energy dispersion for a source 
of interest, in analogy to Eq. (4): 

— J db D(E’; EB, OF 8) Aor (E, 0; 8) tobs(0; Pp) 
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4. Likelihood Formalism 


Our knowledge of the GeV sky is quite limited. As of 2015 only ~3000 y-ray sources 
have been detected.'? In general, the limited statistics of the available data only 
allow us to answer simple questions about any particular source. Often we can 
measure only the total flux and overall spectral index. For brighter sources we can 


106 E. Charles and J. Chiang 


measure spectral curvature, temporal variability or spatial extension if they are 
pronounced enough. 

Furthermore, the single event-based data acquisition, broad PSF and the typical 
sky-survey operating mode make the analysis of pair-conversion telescope data quite 
different from the analysis of data from many other types of instruments. 

The maximum likelihood technique has been seen as the way to get the most 
precise estimates of the relatively small number of source parameters we can mea- 
sure with the available data. Maximum likelihood fitting was first used to ana- 
lyze pair-conversion telescope data in the COS-B era.!? The methodology has been 
refined and adapted first by the EGRET,!! and then the Ferm+LAT and AGILE 
collaborations. 

This likelihood analysis formalism compares the observed distributions of 
y-rays, N(p', E’,t,s), to an expected distribution, M(p’, E’,t,s), that is created 
by convolving source models, S(p, E,t;@), where @ denotes the free parameters of 
the model, with the IRFs. Specifically, the likelihood model is 


M(E’,p',t,s) = / dEdp Aen(E,®, L(t), s)P(6'; E,p, L(t), 8) 
x D(E’; E,p, L(t), s)5(p, E, t; a) 
= / dEdpR(E',p': E,p,t,s)5(p, B, t; 4), (6) 


where the latter relation defines R, the “total response” of the instrument. The 
fitting procedure then finds the values, a, of the fit parameters that maximize the 
likelihood, £(M (a); N). 

When testing the significance of additional model components or parameters, a 
test-statistic (TS) can defined in terms of the likelihood ratio of the best-fit model 
with the additional component included with respect to the null hypothesis (i.e. the 
best fit model with those components omitted): 


L£(M (a); N) 


TS = 2log 7 Caray, NY ”) 


In most analyses of y-ray data the statistics are sufficient that the TS is well- 
described by y? distribution with degrees of freedom equal to the number of addi- 
tional parameters in the test-hypothesis model with respect to the null-hypothesis 
model (i.e. Wilks’!* or Chernoff’s!® theorem applies). 

The likelihood ratio test can be used in a number of ways, e.g. to estimate the 
significance of a source candidate at a particular location, to establish the presence of 
spectral curvature or additional spectral components, or to test for source variability. 

We will first discuss the formalism used to construct the likelihood for binned 
data (Sec. 4.1), then by considering the limit of small bins, we can move to the 
formalism for likelihood analysis of unbinned data (Sec. 4.2). In both cases we 
will discuss approximations that can be used to simplify and vastly speed up the 
analysis. 
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4.1. Binned Likelihood Analysis 


For binned data the Poisson likelihood is: 


f=, Se , (8) 


where nj is observed number of events in bin j and m; is the number of events pre- 
dicted to lie within the bin given the model. In practice, the negative log-likelihood 
is minimized with respect to the model parameters, @: 


— log £ = = U(nj log my — mj —logx5t) (9) 


a 
= — J) njlogm; — Nprea- (10) 
j 


Here the log n,;! term is neglected since it does not depend on the model parameters, 
and we have defined 


Nprea = )_ mj (11) 
Fl 


which is the total number of counts predicted by the model. 

Typically, data are binned both in the apparent direction, p’, and apparent 
energy, £’; in some analyses they are also binned in time. Most analyses only use 
y-rays for a relatively small part of the sky. This region of interest is dictated by 
the size of the source of interest and the PSF and is typically of O(10°). To avoid 
excessive loss of information from binning, the bin size must be smaller than the 
PSF, typically O(0.1°), resulting in binned data images of roughly O(100 x 100) 
pixels for each energy bin. 

The general expression for the expected counts in bin j is 


Mm; = [met s) 
j 
= faz'ap' fat f aBdpRie’.pE.p.t.5)S0.E.%a), (12) 
F SR 


where SR denotes the “source region”, i.e. the region of the sky for which sources 
contribute to the region of interest. Because of the finite PSF, the source region 
must be larger than the region of interest to account for contributions from sources 
near the edge of the region of interest. Figure 6 shows example counts and model 
maps for two different energy ranges for a region of interest centered on the blazar 
3C 279. For analyses of all but the smallest regions of the sky for very short time 
scales it is impractical to actually perform all these integrals. Fortunately, in most 
cases a number of simplifying approximations can be found. 
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Fig. 6. Counts (left) and model (right) maps for a 14° x 14° region centered on the blazar 3C 279. 
The maps were made using 5 years of Fermi-LAT data from the P7REP_SOURCE event selection and 
the associated P7REP_SOURCE_V15 IRF's. All of the maps use 0.1° pixels and are smoothed with a 
0.3° Gaussian kernel. The model maps were built using sources from the 2FGL catalog,!® including 
some sources located just outside the region of interest. Top: maps for the 1 GeV to 1.78 GeV 
energy range. Bottom: maps for the 10 GeV to 17.8 GeV energy range. 


Constant sources and time-integrated IRFs. So long as both the source flux 
and the IRFs are constant, the precomputed livetime cube can be used to calculate 
contributions to the total response for any direction, e.g. the exposure, Eq. (2), or 
the average PSF, Eq. (4). Because of the paucity of photons, non-periodic time- 
dependence in form of flares or longer timescale secular changes in flux or spectral 
shape will be better characterized by performing constant-source likelihood analyses 
using sub-intervals of the data rather than trying to parameterize and fit a time- 
dependent model. Similarly, discrete, time-dependent, changes in the IRFs are best 
handled by treating the sub-intervals as separate observations. 
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Neglecting energy dispersion. In general, the physical processes responsible 
for astrophysical y-ray emission do not produce narrow spectral features. Further- 
more, the IRFs for pair-conversion telescopes vary slowly as compared to the typical 
energy resolution of 10%. Therefore, in many analyses neglecting the energy disper- 
sion results in negligible biases. This vastly speeds up the analysis by avoiding the 
convolution integrals over the true energy. Technically, this is equivalent to using a 
delta-function to represent the energy dispersion D(E”’; E,p, Le, s) = 0(E’ — E). 
We will refer only to true energies, FE, henceforth. 

Applying these two approximations makes it possible to express the expected 
counts in bin 7 as a convolution of the source model with the exposure and average 
PSF: 


m; = f aeap! dp€ (E, p, s)P(p'; p, E, s)S(p, E, t; @). (13) 
j SR 

Factorization of spatial and spectral dependence in source models. Up 
to this point we have treated the source model as a single entity. In fact, the source 
model is typically a combination of many point sources (primarily pulsars and 
AGN), contributions from Galactic diffuse y-ray emission, isotropic extragalactic 
emission, and a few spatially extended sources, such as supernova remnants or 
nearby galaxies. It is often very useful to compute the distribution of expected counts 
for each source component (indexed by 7) separately and combine the expected 
counts distributions: 


ms = 2 Mig 
=> f asap! | dpe(e,0,s)PO'sp.B,9)$(0,E.t). (14) 
v5 SR 


In many cases it is possible to factor the source model for a given source into 
a spatial part, S;(j), and a spectral part, s;(E;d;). This is explicitly true for point 
sources, where $;(f) = 6(f), but is also the case when fitting to a spatial template 
obtained from the analysis of data at other wavelengths. In fact, as the notation 
here suggests, in many cases the spatial description of the source is assumed to be 
known, and only the spectral parameters are being fit. 

For point sources the expected counts distribution is simply the spectral term 
convolved with the average PSF at the source location, p. For sources with exten- 
sions of O(1°) the average PSF changes little across the source, so using any location, 
Pnom Within the source will result in negligible bias and speed up the procedure dra- 
matically by making it possible to invoke the convolution theorem to perform the 
spatial integrals: 


Hic / dBdp'€(E, p) P(p'; B, B)si( Ed); (15) 
4 
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Mij,ex = [ eeae’ dp €(E, >) P(p'; Pnom, E)si(E; di) S;(B). (16) 
j SR 
If only the spectral parameters are being fit it is worthwhile to group every 
other term in Eq. (15) into an energy dependent spectral prefactor, d;;(£), and 
precompute those values: 


dij.pt(E) = ne dp'E(E, p) P(p'; B, E); (17) 
Pj 
dijolE) = | ap! | dpe(E, PPO Pom, E)Si(A). (18) 
Ap! SR 
In both cases the expected counts are simply 
AE; 


Maps consisting of the precomputed d;;(£) values for each source in the model for 
a particular set of IRFs are often referred to as “source maps”. In fact, they are a 
combination of the spatial component of the source models and the instrumental 
response. 


4.2. Unbinned Likelihood Analysis 


As stated earlier, the formalism for unbinned likelihood analysis can be obtained 
by considering the limit of very small bins, such that any bin either has zero or one 
event. In that case, the log of the likelihood is 


log L= S “log M(B", 6’, t,8) — Noprea, (20) 
j 


where the index j now runs over all the bins with one count (cf. Eq. (10)); or, 
equivalently, over all of the counts. The total number of predicted y-ray counts is 


Nea / dE'dp! M(E,p',t, 8). (21) 


Nprea does not depend on the data, and must be calculated by numerical integration. 
Effectively this is equivalent to summing the binned model counts over the region 
and energy bands of interest, and the approximations discussed in Sec. 4.1 may be 
used (Eqs. (10) and (11)). Additionally, if a source is well contained in the region 
of interest, or if the entire sky is being analyzed, the spatial convolution may be 
dispensed with entirely, since only the total number of expected counts is needed, 
rather than the distribution of those counts. 

When evaluating the sum, 5> jlogM (E",p',t), it is impractical to convolve the 
complete instrument response with every source in the model for every observed 
y-ray, so simplifying approximations are required. These approximations are similar 
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to, but slightly different than the those used for the analysis of binned data and are 
discussed below. 


Constant sources and time-integrated IRFs. As was the case for binned 
data, is generally more practical to perform constant-source likelihood analyses 
using sub-intervals of the data rather than trying to parameterize and fit a time- 
dependent model. Powerful techniques, such as the Bayesian Blocks method,!” !® 
can be used with unbinned time data to identify likely sub-intervals of constant 
flux. 


Neglecting energy dispersion. Again, neglecting the energy dispersion results 
in negligible biases for many measurements. 

Applying these two approximations makes it possible to express the model 
likelihood to observe each y-ray, indexed by 7, as 


M;(E,p',t, 8) = i dpAca(E,p, £0), 9) (6's B,8,£(),8)S(6,E;4). (22) 


Note that the effective area and the PSF are to be evaluated given the state and 
orientation of the instrument at the time the jth y-ray was recorded. 


Factorization of spatial and spectral dependence in source models. For 
the analysis of unbinned data, we obtain these expressions for the model likelihood 
to observe a given y-ray from point source and extended sources, respectively: 


Miz pt = Aca (E, p, Lit), s)P(p'; E,p, LG), 5) 8; (E; ai); (23) 


Miz ex -| dp Acs (E, p, L(t), s)P(p'; E, p, L(t), s)s;(E; &)S;(p). (24) 
SR 


When only the spectral parameters of a given source are free in a fit, it is useful to 
precompute the spectral prefactors, d;; for each y-ray in the sample: 


dij.pt = Ace(E,B, L(t), 8) P(6'; E, B, L(t), s); (25) 


ie [ _ tiAce EB, E(t) 8) P(B's 2. B. L(t), 8)5(9) (26) 


Precomputing the “diffuse response”. Most studies focus on individual y-ray 
sources. However, the d;; terms will be the same for every analysis that uses the same 
data selection, IRFs and source models. Clearly, it would be impractical to compute 
and save the d,; for every combination of point source and y-ray. However, a model of 
large-scale diffuse emission is needed for every analysis. More specifically, analyses of 
individual point sources typically use pre-existing models of the Galactic diffuse!® 
and isotropic”? y-ray emission (note that the isotropic spectrum also includes a 
residual cosmic-ray component). Accordingly, it can be useful to compute the dgai,; 
and diso,; for the most commonly used IRF's for each y-ray in the data sample, and 
to include those values as part of the data release. 
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5. Calibration and Validation Control Data 


Because of the complexity of pair-conversion telescopes and of the physics simula- 
tions of particle interactions in them we cannot expect Monte Carlo simulations to 
perfectly reproduce the flight data. For this reason it is important to calibrate and 
validate the IRFs using flight data. Detailed descriptions of calibration and vali- 
dation processes have been published by several instrument teams.*?!-?4 Here we 
will briefly describe the most useful data sets for on-orbit calibration and validation 
studies. 

Although no astrophysical source has perfectly known properties, in practice 
there are several sources for which accurate background subtraction allows extract- 
ing a clean y-ray sample that can be used to validate the Monte Carlo predictions. 
Figure 7 shows examples of the definitions of the signal and background regions 
used for background subtraction from four such samples. Taken together, the four 
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Fig. 7. Examples of background subtraction methods used to define pure y-ray calibration sam- 
ples. In each case signal and background regions in the discriminating variable are shown. Top left: 
pulse phase-selection for the Vela pulsar. Top right: square of the angular separation, a, from the 
nearest AGN. Bottom left: zenith-angle, 02, selection to isolate the Earth’s Limb. Bottom right: 
distribution of Galactic latitudes for y-rays above 17.8 GeV. The figures are illustrative only and 
use looser event selections than typical analyses; details of the selection criteria, the data sets 
used and the definitions of the signal and background regions are given in Ref. 4. This figure is 
reproduced by permission of the IOP. 
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samples span the energy band accessible to pair-conversion telescopes. The samples 
are described in more detail in the rest of this section. 


5.1. Bright Pulsars 


The pulsed high-energy y-ray emission from pulsars allows for almost perfect control 
of the background subtraction. The very short (< ms) timescales of pulse sub- 
structures imply emission regions of < O(300 km), making them essentially perfect 
point sources. Furthermore, the Vela pulsar (PSR J0835—4510) has the largest 
integral flux >100 MeV of any y-ray source, and the Crab (PSR J0534+2200) 
and Geminga (PSR. J0633+1746) pulsars are also extremely bright y-ray sources.'® 
These factors make pulsars very good calibration sources. Unfortunately, the pulsed 
emission of pulsars typically cuts off above several GeV, and pulsations are nearly 
undetectable above 30 GeV by pair-conversion telescopes such as the Fermi- 
LAT.25-27 


5.2. Bright Active Galactic Nuclei 


Given the density of bright y-ray sources in the sky, at energies where the 95% 
containment radius of the PSF is less than ~1° we can use the angular distance 
between a y-ray and the nearest source as a good discriminator for background 
subtraction, particularly at high Galactic latitudes where there are fewer sources 
and the interstellar diffuse emission is less pronounced. Unfortunately, no single 
source is bright enough to provide adequate statistics to serve as a good calibra- 
tor. However, by considering y-rays from a sample of bright and/or hard spectrum 
Active Galactic Nuclei (AGN) that are isolated from other hard sources it is possible 
to make good calibration sample for energies up to ~30 GeV. It is worth noting 
that some AGN might exhibit spatial extension because of the “pair-halo” effect 
where TeV y-rays interact the extra-galactic background light and are reprocessed 
into a “halo” of GeV y-rays around the original source direction.?® Therefore, cau- 
tion is required when using AGN to calibrate the PSF with flight data; see, e.g. 
Ref. 29. 


5.3. The Earth’s Limb 


The Earth’s atmosphere is a very bright y-ray source. Furthermore, at energies 
above a few GeV the y-ray flux seen from space is dominated by y-rays from the 
interactions of primary cosmic-ray protons with the upper atmosphere. This con- 
sideration, together with the typical narrowness of the PSF at energies >10 GeV, 
causes the Earth limb to appear as a very bright and sharp feature, with a very 
smooth spectrum, which provides an excellent calibration source. When using the 
Earth limb as a calibration source we generally limit the energy range to energies 
> 10 GeV, primarily because at lower energies orbital variations in the geomagnetic 
field significantly affect the y-ray fluxes.°° 
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5.4. The Galactic Ridge 


At energies above ~30 GeV no single source provides enough y-rays for a good 
comparison between flight data and MC simulations. However, the combination 
of bright Galactic sources and Galactic diffuse backgrounds means that there is a 
very large excess of y-rays coming from the Galactic plane relative to high Galactic 
latitudes. 

The intensity of the y-ray emission at low latitudes in the inner Galaxy is more 
than an order of magnitude greater than at high latitudes in the outer Galaxy. 
This can be used to validate event selections by comparing their effect on y-ray rich 
samples at low Galactic latitudes with y-ray poor samples high Galactic latitudes. 


6. Looking Forward 


In this chapter we have given an introduction to event reconstruction and source 
analysis in pair-conversion telescopes, and given several specific examples involving 
the Fermi-LAT. 

Prior to the advent of the Ferm+LAT, astrophysical observations in the 
100 MeV to 100GeV range were largely limited by statistics. However, with the 
data obtained from the Fermi-LAT after more than eight years of continuous oper- 
ation, we now have sufficient data that systematic effects, in terms of modeling of the 
astrophysical sources, understanding the residual charged-particle backgrounds, and 
characterizing the instrument response, are starting to dominate our measurements 
for the brighter sources at the 1% to 10% level. 

Therefore, we expect that next generation pair conversion telescopes will require 
important refinements to the data analysis methodology described here, but the 
payoff from those refinements will be the ability to provide much more detailed 
answers about the y-ray sky. 


Acknowledgments 


Much of our current understanding of how best to analyze data from pair-conversion 
telescopes was the result of the work of many people within the Fermi-LAT collab- 
oration. This work included many refinements to the IRFs, the implementation of 
the likelihood formalism and the development of calibrations samples. In particular, 
we would like to acknowledge the contributions of Patrick Nolan to the formulation 
and implementation of the maximum likelihood analysis used for Fermi-LAT data. 


References 


1. G. Kanbach et al., Space Sci. Rev. 49, 69 (1988). 
2. M. Tavani et al., Astron. Astrophys. 502, 995 (2009); doi: 10.1051/0004-6361/ 
200810527. 


28. 


29 
30 


Analysis of Pair-Conversion Telescope Data 115 


_ W.B. Atwood et al., Astrophys. J. 697, 1071 (2009); doi: 10.1088 /0004-637X /697/2/ 

1071. 

M. Ackermann et al., Astrophys. J. Suppl. Ser. 203, 4 (2012); doi: 10.1088/0067-0049/ 

203/1/4. 

R. Mankel, Repts. Progr. Phys. 67, 553 (2004); doi: 10.1088 /0034-4885 /67/4/RO03. 

C. Labanti et al., Nucl. Instrum. Methods Phys. Res. Sect. A 598, 470 (2009); doi: 

10.1016 /j.nima.2008.09.021. 

P. Bruel, J. Phys. Conf. Ser. 404, 012033 (2012); doi: 10.1088/1742-6596 /404/1/ 

012033. 

R. E. Kalman, Trans. ASME — J. Basic Engi. 82, 35 (1960). 

P. Billoir, R. Fruhwirth and M. Regler, Nucl. Instrum. Methods Phys. Res. Sect. 241, 

115 (1985); doi: 10.1016/0168-9002(85)90523-6. 

. R. Fruhwirth, Nucl. Instrum. Methods Phys. Res. Sect. 262, 444 (1987); doi: 10.1016/ 
0168-9002(87)90887-4. 

. J. R. Mattox et al., Astrophys. J. 461, 396 (1996); doi: 10.1086/177068. 

. F. Acero et al., Astrophys. J. Suppl. Ser. 218, 23 (2015); doi: 10.1088/0067-0049/218/ 
2/23. 

. A. M. T. Pollock et al., Astron. Astrophys. 94, 116 (1981). 

. S.S. Wilks, Ann. Math. Statist. 9, 60 (1938); doi: 10.1214/aoms/1177732360. 

. H. Chernoff, Ann. Math. Statist. 25, 573 (1954); doi: 10.1214/aoms/1177728725. 

. P. L. Nolan et al., Astrophys. J. Suppl. Ser. 199, 31 (2012); doi: 10.1088/0067-0049/ 

199/2/31. 

J. D. Scargle, Astrophys. J. 504, 405 (1998); doi: 10.1086/306064. 

. J. D. Scargle, J. P. Norris, B. Jackson et al., Astrophys. J. 764, 167 (2013); doi: 
10.1088 /0004-637X /764/2/167. 

. F. Acero et al., Astrophys. J. Suppl. Ser. 223, 26 (2016); doi: 10.3847/0067-0049/223 / 
2/26. 

. A. A. Abdo et al., Phys. Rev. Lett. 104, 101101 (2010); doi: 10.1103/PhysRevLett. 
104.101101. 

. D. J. Thompson et al., Astrophys. J. Suppl. Ser. 86, 629 (1993); doi: 10.1086/191793. 

. J. A. Esposito et al., Astrophys. J. Suppl. Ser. 123, 203 (1999); doi: 10.1086/313227. 

. A. A. Abdo et al., Astroparticle Phys. 32, 193 (2009); doi: 10.1016/j.astropartphys. 
2009.08.002. 

. A. W. Chen et al., Astron. Astrophys. 558, A37 (2013); doi: 10.1051/0004-6361/ 
201321767. 

_ A. A. Abdo et al., Astrophys. J. 713, 154 (2010); doi: 10.1088 /0004-637X/713/1/154. 

_ A. A. Abdo et al., Astrophys. J. 720, 272 (2010); doi: 10.1088 /0004-637X/720/1/272. 

A. A. Abdo et al., Astrophys. J. 708, 1254 (2010); doi: 10.1088/0004-637X/708/2/ 

1254. 

F. A. Aharonian, P. S. Coppi and H. J. Voelk, Astrophys. J. Lett. 423, L5 (1994); doi: 

10.1086 /187222. 

. M. Ackermann et al., Astrophys. J. 765, 54 (2013); doi: 10.1088/0004-637X/765/1/54. 

. A. A. Abdo et al., Phys. Rev. D 80, 122004 (2009); doi: 10.1103/PhysRevD.80.122004. 


This page intentionally left blank 


Chapter 6 


Atmospheric Cherenkov Gamma-Ray 
Telescopes 


Jamie Holder 


Department of Physics and Astronomy and the Bartol Research Institute 
University of Delaware, Newark, DE 19716, USA 


jholder @physics. udel. edu 


The stereoscopic imaging atmospheric Cherenkov technique, developed in the 
1980s and 1990s, is now used by a number of existing and planned gamma-ray 
observatories around the world. It provides the most sensitive view of the very 
high-energy gamma-ray sky (above 30 GeV), coupled with relatively good angular 
and spectral resolution over a wide field of view. This chapter summarizes the 
details of the technique, including descriptions of the telescope optical systems and 
cameras, as well as the most common approaches to data analysis and gamma-ray 
reconstruction. 


1. Introduction 


Astrophysical very-high-energy (VHE) gamma rays (with energies 230 GeV) are 
believed to result almost exclusively from the interactions of populations of highly 
relativistic particles with ambient matter or photon fields. The study of these VHE 
photons therefore allows us to examine the processes of particle acceleration in the 
Universe, and the extreme environments in which they occur. Gamma-ray astron- 
omy also provides a unique tool for many complementary astrophysical topics. For 
example, extragalactic background photon fields and intergalactic magnetic fields 
can be measured, or constrained, by their imprint on the measured properties of 
distant gamma-ray sources. Gamma-ray signatures of candidate dark matter par- 
ticles may also lie in the VHE band, and can be sought through observations of 
regions in which the densest clumps of dark matter are believed to lie. Around 
150 VHE gamma-ray sources have now been detected!” (Fig. 1). These comprise 
many different source classes (pulsars and their nebulae, supernova remnants, and 
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Fig. 1. The locations, in Galactic coordinates, of all known astrophysical sources of TeV gamma- 


ray emission, as of 2015. (Figure courtesy of TeVCat.! See there for a full description of the 
different source classes. See electronic edition for a color version of this figure.) 


active galactic nuclei, to name a few) and the majority have been discovered using 
ground-based Atmospheric Cherenkov Telescopes (ACTs). 

The Earth’s atmosphere is opaque to high energy photons, and so the most 
direct approach to study the gamma-ray sky is to send detectors into space. How- 
ever, astrophysical gamma-ray production mechanisms typically result in steeply 
falling power-law spectra, leading to a very low photon flux at high energies. 
The Crab Nebula, for example — one of the brightest astrophysical gamma-ray 
sources — produces a flux of only ~6 photons m~? year~! at the Earth above 
1 TeV. To study the Universe at these energies therefore requires a detector with 
enormous collection area, far beyond the maximum practical size of a satellite-borne 
device (which is ~1 m?). Atmospheric Cherenkov Telescopes achieve this feat by 
measuring the Cherenkov light produced by gamma-ray-triggered particle cascades 
(or air showers) in the atmosphere. In this way, using the Earth’s atmosphere as an 
intrinsic part of the detection technique, effective collection areas can easily exceed 
10° m?. 

The potential of this approach for gamma-ray astronomy was first explored by 
Jelley and Galbraith in the 1950s,? but attempts to exploit it were hampered by 
the overwhelming background of charged cosmic rays. The first significant discovery 
of an astrophysical TeV gamma-ray source was not made until the detection of the 
Crab Nebula, using the Whipple 10-meter telescope, in 1989.4 This success was the 
result of the development of effective methods to record an image of the Cherenkov 
emission from air showers. A complete account of the long history and development 
of the field is given by Hillas.° 

Three major ACT facilities are currently operating, the key properties of which 
are listed in Table 1. They each provide sensitivity to gamma-ray sources with a 
flux below 1% of the steady flux from the Crab Nebula. Figure 2 shows one of these, 
the VERITAS array, with which the author is associated. 
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Table 1. Details of each of the major Atmospheric Cherenkov Telescope facilities. 


Number Optical Number Field 

Location of Telescopes Aperture Design of Pixels of View 
H.E.S.S. Namibia 4 12m Davies-Cotton 960 5.0° 
H.E.S.S. II Namibia 1 28 m Parabolic 2048 3.2° 
MAGIC II La Palma 2 17m Parabolic 1039 3.5° 
VERITAS Arizona, USA 4 12m Davies-Cotton 499 3.5° 


4H.E.S.S. Il is an addition to the H.E.S.S. array, located in the center of the four original telescopes. 


Fig. 2. The VERITAS Atmospheric Cherenkov Telescope array.® (See electronic edition for a 
color version of this figure.) 


2. Air Showers and Atmospheric Cherenkov Emission 


The design of atmospheric Cherenkov gamma-ray telescopes is driven by the essen- 
tial characteristics of Cherenkov emission from air showers, which we first briefly 
describe. 

A VHE gamma ray incident on the Earth’s atmosphere converts into an 
electron—positron pair. Subsequent Bremsstrahlung and pair production interactions 
lead to the generation of an electromagnetic cascade in the atmosphere. The radia- 
tion length, Xo, for Bremsstrahlung in the atmosphere is 37.15 g cm~?, which is 7/9 
of the mean free path for pair production. This similarity allows a simple analytical 
approximation for the shower development (first developed by Heitler’), in which 
the total number of electrons, positrons and photons doubles every In(2)Xo. The 
primary gamma-ray energy, Eo, is split evenly among the secondary products. The 
shower continues to develop until the average electron energy drops to FE, = 84 MeV, 
the critical energy below which ionization losses dominate. The maximum number 
of particles in the cascade is given by Eo/E.. 

Cosmic rays — charged, relativistic protons and nuclei — also produce air 
showers in the atmosphere. In this case, the cascade development is more complex, 
with hadronic interactions proceeding through a variety of channels, leading to the 
production of secondary nucleons, along with charged and neutral pions with large 
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transverse momenta. The pions do not survive to sea level: neutral pions decay 
rapidly into two gamma rays, while charged pions produce muons and neutrinos: 


To —>yt+y 


+ 


T — pt tu, 


Te 


The gamma-ray secondaries thus produced can trigger electromagnetic sub-showers, 
while the long-lived muons form the most penetrating component of the cascade, 
often reaching the ground. The result of this is that cosmic-ray-initiated air showers 
develop much less regularly than gamma-ray-initiated cascades, as illustrated in 
Fig. 3. These differences in the shower morphology, along with the reconstruction 
of the arrival direction of the incoming primary, allow ACTs to achieve an efficient 
discrimination of gamma-ray photons from the otherwise overwhelming isotropic 
cosmic-ray background. 

The relativistic charged particles in air showers are moving faster than the 
speed of light in air (v > c/nair, where nair is the refractive index) and so generate 
Cherenkov radiation. Cherenkov light is produced throughout the cascade devel- 
opment, with the maximum emission occurring when the number of particles in 
the cascade is largest, at an altitude of ~10 km for primary gamma-ray energies of 
100 GeV to 1 TeV. Each particle generates Cherenkov light at a fixed angle to the 


Fig. 3. Monte Carlo simulations of the tracks of particles in photon and proton initiated air 
showers.® The first interaction height is fixed at 30 km. The horizontal axis range is +5 km. 
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Fig. 4. Monte Carlo simulations of the distribution of Cherenkov photons on the ground for 
gamma-ray-initiated air showers. The left plot shows the Cherenkov photon density as a function 
of radial distance from the shower core for primaries with a range of energies, the right shows 
the two-dimensional photon density on the ground for a shower with a 300 GeV primary. Figure 
courtesy of G. Maier. (See electronic edition for a color version of this figure.) 


direction of motion, (Ac), given by 


Cc 


cos 9g = : 
UNair 

The Cherenkov angle is ~1.3° at sea level. Electromagnetic cascade particles also 
undergo multiple Coulomb scattering, which distributes their directions over a small 
angular range and generates the shower’s lateral extent. The resultant filled “pool” 
of Cherenkov light on the ground has a photon density of ~100 photons m~? for a 
1 TeV gamma-ray primary, and a radial extent with a peak at ~130 m, as illustrated 
in Fig. 4. The peak is due to a focusing effect resulting from the changing angle of 
Cherenkov emission with atmospheric depth. 

The Cherenkov photon yield is proportional to 1/\? (where \ is the wave- 
length). The spectrum is therefore dominated by blue/UV emission, peaking around 
340 nm. Shorter wavelength emission is subject to atmospheric absorption (partic- 
ularly ozone), and therefore does not reach the ground, unless it is generated very 
deep in the atmosphere (for example by penetrating muons). Cherenkov photons 
from each shower arrive in a brief pulse of a few nanoseconds duration. The time- 
averaged photon yield from all air showers constitutes only ~1/10000th of the back- 
ground night-sky light in the visible, but the light from a single shower can rival 
the brightest objects in the night sky for the brief duration of the pulse. 


3. Detection 


The goal of an atmospheric Cherenkov gamma-ray telescope is to detect the 
Cherenkov emission from air showers, and to use this to determine the nature of 
the primary (gamma ray or cosmic ray), along with its arrival direction and energy. 
The detection technique is, in essence, rather simple, requiring only a large mirror 
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to collect Cherenkov photons, and a fast photon detector coupled to an oscilloscope 
to record them. The first detection of Cherenkov emission from an air shower was 
made with a 0.2-m? reflector, a single photomultiplier tube (PMT) and a free- 
running analog oscilloscope.? Modern ACT arrays perform the same task, but can 
reach mirror areas > 600 m?, instrumented with thousands of PMTs coupled to 
GHz sampling electronics and sophisticated trigger systems. 

While detection is relatively straightforward, gamma-ray discrimination and 
reconstruction is rather more challenging. One approach is to measure the arrival 
time and photon density distribution of the Cherenkov light at ground level. This 
“wavefront sampling” method was explored by experiments such as STACEE!° and 
CELESTE," using the very large mirror areas provided by the heliostats of exist- 
ing solar power facilities. The brightest previously known astrophysical gamma-ray 
sources were detected using this technique, but the difficulty of effective gamma-ray 
discrimination limited its usefulness. The technique may be more applicable at the 
highest energies (> 10 TeV), where small, widely separated detectors allow one to 
achieve effective areas of ~100 km?. This idea is currently being investigated by the 
HiSCORE experiment.!? 

By far the most successful approach, used by all of the major facilities in opera- 
tion today, is the stereoscopic imaging technique. The principle of this is illustrated 
in Fig. 5. Large convex reflectors are used to focus the Cherenkov light from air 
showers onto a camera comprising photo-detector pixels. The camera records an 
image of the shower, and the properties of the image (its shape, intensity and ori- 
entation), allow determination of the properties of the shower primary. Applying 
this to an array of telescopes (“stereoscopy” ) provides a view of the same shower 
from a number of different perspectives, and so enhances the geometrical shower 
reconstruction. It is worth stressing that a key aspect of this technique is the neces- 
sity for accurate Monte Carlo simulations of both the shower development and the 
detector response. 


4. The Design of an Atmospheric Cherenkov Telescope Array 


Since they observe blue Cherenkov light from air showers, atmospheric Cherenkov 
gamma-ray telescopes are effectively optical telescopes, working in the visible band 
of the electromagnetic spectrum. They are subject to the same constraints as other 
optical telescopes — observations must be conducted at night, under clear skies, at 
a dark site — but the design requirements are very different. 


4.1. Optical Systems 


Two competing requirements inform the optical design of ACTs. The first is for 
a very large aperture, and hence mirror area. This allows one to collect as many 
Cherenkov photons as possible from each shower, which in turn defines the lowest 
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Telescope field of view 


10km 


Fig. 5. An illustration of the stereoscopic imaging technique. A gamma ray triggers an electro- 
magnetic cascade in the Earth’s atmosphere, which generates Cherenkov radiation in a pool on the 
ground. Telescopes within this light pool are used to form an image of the shower, which allows 
reconstruction of the arrival direction of the incident primary photon. (See electronic edition for 
a color version of this figure.) 


gamma-ray energy threshold of the telescope. Fortunately, the relatively crude 
cameras of ACTs, and the lack of detailed structure in the Cherenkov images, 
means that the mirror form and surface quality is much less important than for 
optical telescopes. An optical point-spread-function of a few arcminutes is usually 
adequate. This level of performance can be achieved using tessellated reflectors, 
made up of hundreds of individual mirror facets. 

The second requirement is for a large field of view. Cherenkov images from air 
showers are approximately elliptical in shape, with an angular extent of up to a few 
degrees. The images are offset from the arrival direction of the shower primary — 
in the case of gamma-ray-initiated showers, this means that the image is offset from 
the gamma-ray source position in the field of view. The angular distance of the offset 
is proportional to the shower impact parameter® (Fig. 5). Even a point source of 
gamma rays, therefore, requires a field of view of a few degrees diameter. In reality, 
many known sources of gamma-ray emission (particularly supernova remnants and 


*The distance between the shower core projected onto the ground and the telescope. 
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pulsar wind nebulae) have a large angular extent. Additionally, analysis of ACT data 
typically uses a portion of the field of view in which there are no known gamma- 
ray sources to estimate the background of remaining cosmic-ray showers. Currently 
operating arrays have fields of view of 3-5°, while plans for the next generation of 
instruments reach 8—10°. 

The requirement for a very large field of view for each telescope dictates a small 
focal ratio (focal length, f, divided by aperture, D) — typically around 1.0. OfF-axis 
optical aberrations, particularly coma and astigmatism, are therefore an important 
consideration. A common approach for a tessellated reflector, used extensively for 
ACTs (starting with the Whipple 10-meter), was first developed by Davies and 
Cotton for a U.S. Army solar furnace facility — their original application was for 
the thermal testing of materials for military purposes.’ In this design, individual 
spherical mirror facets, with a radius of curvature of twice the focal length of the 
telescope (2/), are placed on the surface of a spherical reflector with a radius equal 
to f. The facets are aligned such that the normals of the individual facets point 
to the 2f position along the optic axis. The reflector is therefore discontinuous at 
every point, and ideal performance is achieved with the smallest facets. As well as 
providing off-axis performance superior to that of a single spherical or parabolic 
reflector (Fig. 6), the Davies—Cotton design uses identical mirror facets, which can 
be inexpensively mass-produced. Mirror facet alignment is also relatively simple. 
One downside, however, is that the design is not isochronous — the reflector induces 
some time spread in the arrival time of Cherenkov photons at the telescope cameras, 
typically on the order of a few nanoseconds. Tessellated parabolic reflectors, used 
by the world’s largest ACTs (MAGIC and H.E.S.S. II), do not suffer from this 
drawback, but require facets of varying forms to be produced, with a corresponding 
increase in cost and complexity. 

Aplanatic two-mirror telescopes provide a solution to off-axis aberrations, 
while retaining isochronicity and also reducing the plate scale in the focal plane 
significantly.!° Cost and complexity again present challenges, but the benefits of 
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Fig.6. The image of a star viewed at different distances from the optic axis of a H.E.S.5S. telescope, 


showing the effects of optical aberration in a Davies—Cotton optical system.!4 
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two-mirror systems are such that they will very likely form a part of the next gener- 
ation ACT arrays. Prototyping is already underway, with the Schwarzschild—Couder 
design among the favored options.!® 

The technology for producing mirror facets is also a very active area of devel- 
opment.!’ Traditional techniques use milled aluminum, or glass that is “slumped” 
to the required shape, polished, and then coated with anodized aluminum. Car- 
bon or glass fiber, aluminum honeycomb or a composite design can offer a more 
lightweight, cost-effective solution. With typically hundreds of mirror facets per 
telescope, alignment of the facets is not trivial. Stepper motors can be used to 
provide active mirror control, which greatly simplifies this task, as well as allowing 
for alignment corrections due to mechanical deformations during observations. 


4.2. Telescope Structure 


The mechanical design of ACTs is also challenging, given the extremely large aper- 
tures, and the necessity of supporting a large, delicate and massive detector package 
at the prime focus. Weight and simplicity considerations have led to the adoption of 
alt-azimuth mounts for all modern ACTs. The rigidity requirements of the optical 
system and camera support structures have been solved in two ways — either by 
the brute force approach of constructing the telescope superstructure from a steel 
space frame (in the case of the H.E.S.S. and VERITAS telescopes, for example), or 
by the use of a lightweight carbon fiber frame coupled with an active mirror adjust- 
ment system (in the case of MAGIC, and planned for the largest next generation 
telescopes — see Fig. 7). 


2.3m 


Fig. 7. Left: The carbon fiber support structure design for the 23 m aperture Large Size Tele- 
scopes of the Cherenkov Telescope Array.!® Right: The PMT camera of the H.E.S.S. II telescope, 
containing 2048 PMTs and weighing 3 metric tons.19 (See electronic edition for a color version of 
this figure.) 
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The telescope is also required to track accurately — typically to within a few 
arcminutes. The position of the telescope is monitored by encoders (usually optical, 
with arcsecond resolution). A software model of the telescope pointing, calibrated 
using observations of stars, is used to translate these measurements into a position 
on the sky. The online tracking is supported by CCD pointing monitors, which are 
fixed to the telescope structure and track star positions, as well as the exact gamma- 
ray camera location. Offline corrections based on these CCD measurements are used 
to reduce systematic telescope pointing errors to typically tens of arcseconds. 

ACTs must also be able to slew to new targets as rapidly as possible. The 
40 metric ton, 17 m diameter MAGIC telescopes, for example, are able to move to 
observe any position in the sky within 40 seconds. This requirement is driven by the 
transient nature of the gamma-ray sky, which contains many sources known to flare 
dramatically on short timescales. In the case of gamma-ray bursts, the emission 
may last just a few seconds — although none of these have yet been detected from 
the ground, despite rapid slewing triggered by satellite alerts. 


4.3. Cameras 


Large aperture, single-reflector ACTs require physically large cameras (> 1 m) to 
cover an adequately large field of view. In order to record an image, the photo- 
sensitive area must be divided into pixels, numbering hundreds or thousands, with 
each pixel sampling < 0.1°. The photodetector pixel of choice for ACTs has, in 
most cases, been the photomultiplier tube (PMT). These devices provide reasonable 
photon detection efficiency (~20%), nanosecond response times, a large detection 
area, and extremely clean signal amplification (by factors of ~100,000), allowing 
them to easily resolve single photon signals. Dead space between the photo-sensitive 
areas of the individual pixels is recovered by placing close-packed light-concentrating 
Winston cones on the camera face. One example is the H.E.S.S. II PMT camera, 
shown in Fig. 7, weighing 3 metric tons and containing 2048, 1.25 inch PMTs. The 
camera is housed at the telescope focal point, 36 m from the center of the reflector 
dish, giving a field of view of 3.2° in diameter (which is relatively small for an 
ACT).'® While the size of ACTs prohibits the construction of domes around the 
complete telescopes, the expensive and delicate PMT cameras are usually housed in 
light-tight boxes, which allow for daytime testing and calibration. The H.E.S.S. II 
camera can actually be removed when required, and stored in a protective enclosure. 

A number of recent technological advances are now finding their way into ACT 
camera design. PMT photocathode developments now yield quantum efficiencies of 
up to 40% at short wavelengths (so-called “ultra-bialkali” devices). PMTs are now 
also available in “multianode” packages, in which an array of close-packed PMT 
cells are incorporated in a single housing, greatly reducing the cost per pixel of an 
ACT camera, and allowing for much finer pixelation of the field of view. Silicon 
photodetectors also show great promise, as demonstrated by the FACT (“First 
G-APD Cherenkov Telescope”) telescope, a small (9.5 m?) pathfinder experiment, 
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equipped with a camera containing 1440 individual Geiger-mode avalanche photodi- 
ode detectors.7? Modern silicon-based devices can provide higher photon detection 
efficiency than PMTs, require lower operating voltages, and can cover large areas 
at relatively low cost — in particular with the development of Multi-Pixel Photon 
Counter (MPPC) arrays, containing arrays of up to 64 discrete detectors, each of 
which can be read out individually. 


4.4. Trigger and Data Acquisition Systems 


The arrival times of atmospheric Cherenkov flashes at the telescope are random and 
unpredictable. The flashes also last just a few nanoseconds, and exhibit temporal 
structure on timescales even shorter than this. Continuously monitoring the sky with 
GHz sampling rates on hundreds or thousands of channels is impractical; instead, 
it is necessary to trigger the data acquisition system of ACTs, such that the photo- 
detector outputs are recorded only for a small time window around the arrival time 
of the Cherenkov flash. Since the trigger decision time is longer than the duration 
of the flash itself, the photodetector output signals must be delayed (e.g. by routing 
analog signals through long cables), or continuously sampled and stored in digital 
memory buffers. Upon receipt of a valid trigger, the relevant data time window can 
be accessed and saved to disk as digital samples. 

Trigger systems typically work on multiple levels — the design goal being to 
trigger on the faintest possible Cherenkov flashes, without incurring a prohibitively 
high rate of false triggers due to the fluctuating night-sky background. Individual 
pixels are equipped with discriminators, which produce a digital output. The out- 
puts for each camera are passed to a logic circuit which looks for spatial coincidences 
(e.g. three neighboring pixels must have triggered within a few nanoseconds). The 
final trigger decision occurs at the array level — typically at least two telescope 
cameras must have triggered at the same time, after correction for the different 
path lengths of the Cherenkov light to each telescope. Many variations on this basic 
scheme exist, notably the analog sum-trigger developed for use on the MAGIC 
telescopes.?! Figure 8 shows a“bias curve” for the VERITAS array, illustrating the 
changing event rate as a function of individual pixel discriminator threshold, for 
each of the four telescopes, and for the complete array. 

Prior to digitization, the photo-detector outputs are pre-amplified, as close to 
the sensor as possible. This boosts the signal strength without significantly increas- 
ing the signal-to-noise ratio, and allows PMT detectors (in particular) to run at 
lower gain — extending their useful lifetime, and protecting them from damage due 
to bright DC light sources, including stars in the field of view. Since the Cherenkov 
flashes sit upon a continuous DC background of night-sky light, the signals are also 
AC-coupled at this point. Digitization is accomplished in various different ways — 
historically, simple integrating analog-to-digital converters (ADCs) were used, while 
more modern systems use flash-ADCs or custom-designed analog ring sampling 
devices, operating at multi-GHz sampling speeds. The final data products consist 
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Fig. 8. Left: A“bias curve” for the VERITAS array. The rate of triggers is shown as a function 
of the discriminator threshold chosen for all of the PMT pixels. The upper curves correspond to 
the individual telescope rates, the lower curve to the complete array, after requiring a coincident 
trigger from at least two telescopes. A clear break point is seen at the transition between triggers 
dominated by night-sky background fluctuations, and those dominated by cosmic-ray air showers. 
The dashed line indicates a typical threshold setting for standard operations. Right: Data products 
for a VERITAS telescope, consisting of a digitized signal trace for each PMT in the 499 pixel 
camera. The image in the camera has been cleaned, by setting the signal to zero in all pixels which 
contain no Cherenkov light. The ellipse shows a simple moment-based parameterization of the 
image. (See electronic edition for a color version of this figure.) 


of a sampled (or integrated) signal trace for every pixel for each triggered event 
(Fig. 8). The data rate of modern ACTs is in excess of a few hundred recorded 
events per second, and reaches a few KHz for the largest telescopes. 


4.5. Peripheral and Environmental Systems 


In addition to the telescopes themselves, a wide variety of peripheral systems are 
usually deployed, associated with calibration and monitoring tasks, both of the tele- 
scopes and of the atmosphere above them. Telescope calibration requires nanosecond 
light pulsers, used to flat-field the photodetector gains, and to measure their sin- 
gle photon response. Atmospheric monitoring can be achieved with local weather 
stations, and with LIDARs, optical telescopes, and infra-red radiometers, which 
can reveal the presence of clouds by measuring the radiative temperature of the 
night sky. 


4.6. Array Design 


To this point, we have focused on the design of individual telescopes; however, the 
use of multiple telescopes in concert dramatically increases the sensitivity of the 
technique, along with its angular and spectral resolution. Numerous studies have 
been performed on the optimum layout and spacing of ACT arrays.?? *4 The conclu- 
sions can broadly be summarized as follows: (i) more telescopes are better, with the 
array sensitivity increasing roughly as the square root of the number of telescopes, 
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and (ii) the optimal spacing depends upon the energy range to be covered: wider 
spacing provides best sensitivity for higher energies. One further point to note is 
that the array performance changes when the array becomes significantly larger 
than the Cherenkov light pool.?° In this regime the central region of the Cherenkov 
light pool is always sampled by multiple telescopes (unlike smaller arrays, where 
the shower core often lies outside of the area enclosed by the array). The result 
of this is appreciably better sensitivity at low energies, for telescopes of relatively 
modest size. 


5. ACT Data Analysis 


The analysis of ACT data is complex, and the details have comparable impact on 
the sensitivity and performance of the array as do many aspects of the hardware. To 
recap, the goal of the analysis is to identify the primary particle, and to reconstruct 
its arrival direction and energy. This information is then used to assess the statis- 
tical significance of any gamma-ray signal, to map its distribution on the sky, and 
to reconstruct the gamma-ray flux and energy spectrum. Many different analysis 
methods exist in the literature, and the details vary between the different arrays. 
Here we describe the most common techniques in broad detail, and conclude with 
a brief summary of some of the more sophisticated methods in use. 


5.1. Flat-fielding and Image Cleaning 


The raw data products for ACTs consist of a digitally sampled signal trace for 
each of the photosensors in the cameras, roughly centered on the arrival time of the 
Cherenkov pulse (Fig. 8). The first stage of the data processing consists of measuring 
and subtracting the signal pedestal value — the baseline value in the absence of 
any Cherenkov photons. The next step is to identify those pixels which contain a 
Cherenkov signal, above some pre-defined threshold. The signals are then corrected 
for variations in the photodetector gain values, measured using a calibration light 
pulse. The result of this pre-processing is a cleaned, calibrated image, typically 
approximately elliptical in shape (Fig. 8). 


5.2. Identification of the Primary 


Even for a moderately strong gamma-ray source, cosmic-ray shower images out- 
number gamma-ray shower images by at least a factor of ~10°. Effective separation 
of the gamma-ray events is therefore crucial. In the case of a point source of gamma 
rays, by far the most effective tool for discrimination is the arrival direction of the 
primary — but many TeV sources have large angular extent, up to a few degrees in 
diameter. Fortunately, significant differences in the Cherenkov image morphology, 
originating in the differences in the air shower development, make discrimination 
possible — despite the relatively crude optics and camera pixelation of an ACT.?° 
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The cleaned images are parameterized by a simple moment analysis, in which 
their width, length and orientation are calculated. Gamma-ray images are typically 
less wide, and shorter, than cosmic-ray images with similar Cherenkov intensity and 
impact parameter. In the case of a single telescope, simply selecting images with 
small width and length provides fair discrimination.?” The power of this analysis 
is dramatically increased, however, when multiple telescopes view the same shower. 
In this case, the shower core location, and hence the impact distance from each 
telescope (R), can be determined to within an accuracy of ~10 m. The core location 
is reconstructed geometrically; in the reference frame of the array, all images point 
away from the shower core location, and so the core can be found by intersecting 
the image major axes. 

Once the core location is known, the measured width of the image can be com- 
pared with a prediction, widthyc, for images with the same Cherenkov intensity, s. 
This prediction, with an associated spread, Owiath, is derived from detailed Monte 
Carlo simulations of the air shower development and the telescope response. The 
predicted widths are typically stored in look-up tables, a number of which are gen- 
erated corresponding to various different conditions under which the observations 
were taken (e.g. elevation angle, background night-sky brightness). The result of 
this comparison is then combined for all of the Cherenkov images of the shower 
(Nimages) like so: 


Nimages _. : 
1 ~ width; — widthyo(R, s) 
mscw = ———— | 
Nimages j=l Owidth (R, s) 


where mscw is known as the “mean-scaled width”, and is used to provide effective 
discrimination between gamma-ray and cosmic-ray initiated events?®:?9 (Fig. 9). A 
similar method can be applied to the image length. Various other parameters have 
also been derived and used with different degrees of success (e.g. height of shower 
maximum, Cherenkov photon arrival time gradient along the shower). 

For the purposes of gamma-ray astronomy, a simple discrimination between 
gamma rays and all other primaries is usually all that is required. ACTs can also 
serve as powerful tools for cosmic-ray physics, however, and attempts have been 


made to measure the spectrum and composition of the nuclear cosmic ray flux,?° 3! 


as well as the electron component.°? 


5.3. Arrival Direction Reconstruction 


Reconstruction of the arrival direction of the shower primary serves two purposes: 
it provides effective discrimination between gamma-ray photons from the source 
and the isotropic charged cosmic-ray background, and it allows us to study and to 
map out the gamma-ray emission. Accurate location of the point of origin of the 
gamma-ray emission is often necessary to the identification of gamma-ray sources, 
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Fig. 9. Left: The distribution of the mean-scaled width for gamma rays and cosmic rays, from 
Monte Carlo simulations and for a real gamma-ray source. Selecting events within the shaded 


region provides effective discrimination of gamma-ray events from the hadronic background. 


Right: H.E.S.S. map of gamma-ray emission from the supernova remnant RX J1713.7-3946.°° 


(See electronic edition for a color version of this figure.) 


and gamma-ray mapping of extended astrophysical sources provides clues to the 
particle acceleration processes at work in these objects. 

In the field of view of the telescopes, the major axes of the image ellipses inter- 
sect at the point corresponding to the arrival direction of the primary particle, as 
shown schematically in Fig. 5. This fact is used to provide an estimate of the arrival 
direction, usually with some weighting scheme which gives additional weight to the 
axes of the brightest images.?? The resulting angular resolution of the technique 
is energy dependent, with typically 68% of the gamma rays from a point source 
reconstructed to within 0.1° of the source location, for energies around 1 TeV. At 
lower energies, fluctuations in the shower development, and low Cherenkov photon 
statistics, degrade the resolution somewhat. 

Once the arrival directions have been calculated, any point in the field of view 
can be tested for evidence for gamma-ray emission, by selecting those events which 
lie within a pre-defined radius around the test point. This process is complicated by 
the fact that the gamma-ray emission from each point lies on top of the remaining 
background of misidentified cosmic-ray events. In order to calculate the gamma- 
ray excess, and to calculate the statistical significance of this excess, it is therefore 
necessary to find an independent estimate of the remaining background at each 
point. This is accomplished by measuring the background rate in blank regions of the 
sky, from which little or no gamma-ray emission is expected. These “OFF-source” 
regions can be selected in a variety of different ways: for example by dedicated 
observations of adjacent fields of view, or (more commonly) by selecting regions 
within the same field of view, but offset from the test position. In this latter case, 
particular care must be taken to account for the varying detection efficiency across 
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the field of view. A full description of a sample of common background estimation 
techniques is given by Berge et al.*4 

Once the background is known, the gamma-ray excess at any position can be 
measured, and its significance calculated*° (or an upper limit to the number of 
excess events, in the case of no detection). By testing a range of positions on a 2D 
grid, a map of gamma-ray emission on the sky can be constructed. Figure 9 shows 
an example of a gamma-ray excess map from the direction of a supernova remnant, 
as measured by the H.E.S.S. telescope array. Converting the excess (or upper limit) 
into a measurement of the photon flux from the source (in photons cm~? s~+), 
requires detailed modeling of the energy-dependent effective area of the telescope 
array, as described in the following section. 


5.4. Gamma-Ray Energy, Flux and Spectrum 


Calculation of the energy of an incident gamma ray relies upon the fact that, 
to a good approximation, the number of particles in the shower, and hence the 
Cherenkov photon yield, is directly proportional to the primary energy. Measur- 
ing the Cherenkov emission intensity, and combining this with the distance to 
the shower, therefore allows an estimate of the gamma-ray energy.?’ Multiple 
telescopes improve this energy estimate dramatically, since they provide multiple 
measurements of the shower light yield, and an improved estimate of the shower 
core location.*® 

In practice, the energy estimate is made by referring to look-up tables that 
contain the predicted gamma-ray energy as a function of impact parameter and 
Cherenkov intensity. The contents of the tables are derived from Monte Carlo simu- 
lations of the shower development, and of the telescope response. For the purposes 
of energy estimation, the most important parameter of the telescope model is the 
single photo-electron response of the photo-detectors and their read-out electronics. 
The most important factor in simulating the Cherenkov yield at the telescope mir- 
rors is the Earth’s atmosphere. This is much more difficult to monitor and account 
for, and hence introduces systematic uncertainties at the level of at least 10%. 
Numerous tables are generated, corresponding to different observing conditions 
(elevation angle, background night-sky brightness, source offset in the field of view). 
The energy resolution of the technique depends upon the energy of the primary, 
reaching typically ~15% above 1 TeV, and degrading below this. 

Converting the measured energy distribution of gamma rays from a source into 
a meaningful flux estimate, or energy spectrum, requires knowledge of the effective 
area of the detector. In the case of ACTs, the maximum effective area is determined 
by the size of the Cherenkov light pool, rather than by the size of the telescopes or 
the area of the array, and can reach >10° m? at high energies. At lower energies, the 
trigger efficiency of the array (and hence the effective area) drops sharply, eventu- 
ally reaching zero. The energy-dependent effective area is calculated by simulating 
gamma-ray showers over a wide range of impact parameters, and with an energy 
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distribution similar to a typical source (e.g. a power law with an index of —2.0). 
The ratio of the number of triggered events to the number of events simulated, 
multiplied by the area over which the events were thrown, then gives the effective 
area. The effective area again depends upon a wide variety of operating conditions 
(elevation, sky brightness) and analysis parameters (gamma-ray selection cuts, exact 
analysis method), which must be precisely matched between the data analysis and 
the simulations. 

The reconstructed energy distribution of gamma-ray events from a source can 
then be divided by the energy-dependent effective area in order to reconstruct the 
true energy spectrum of the source. Systematic biases can arise due to the fact 
that the effective area estimate depends upon the simulated gamma-ray spectrum 
(due to the finite energy resolution of the instrument). This can be addressed by 
recalculating the effective area using the fitted energy spectrum iteratively, until 
the two converge. More sophisticated unfolding methods are also used to account 
for the finite resolution of the technique.*” 

Gamma-ray source spectra are smooth continua, typically well-fit by straight or 
curved power laws, or by a power-law with an exponential cutoff. The spectra can 
be most easily parameterized by fitting a chosen form to the gamma-ray flux points 
using the least-squares method. A more sophisticated approach, less prone to biases 
introduced by binning the data, is to perform a maximum-likelihood estimation 
of the spectral parameters, taking into account the effective area and the energy- 
resolution function of the detector.4° 


5.5. Alternative Analysis Methods 


The analysis methods described above were developed by the Whipple and HEGRA 
collaborations in the 1990s. They are robust against changing conditions, provide 
good sensitivity, and are widely used to this day. However, the development of 
analysis tools has always proceeded in parallel with the hardware developments of 
ACTs, and many alternative methods exist in the literature. Some of these provide 
significance improvements in sensitivity, energy threshold, or angular or spectral 
resolution. 

One flaw of the standard method is that it does not take advantage of the fact 
that an array provides multiple views of the same shower, and so the images recorded 
should be correlated. This additional information can be exploited by performing 
a global fit to the data, using a model of the shower development based on the 
primary energy, arrival direction and impact parameter. The first implementation 
of this method was made by the CAT collaboration, using just a single telescope, 
and a simple analytical 2D model of the shower profile.4! The technique has subse- 
quently been refined to work with multiple telescopes, to perform a log-likelihood 
minimization using all pixels in the camera,‘? and to use 3D analytical models 
of the shower development,*? 
ated by Monte Carlo simulations.*+ In these schemes, the goodness-of-fit parameter 


or direct comparison with template images gener- 
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provides a single powerful discriminant to separate gamma rays from background. 
The method also automatically provides an energy estimate, which can be used to 
reconstruct spectra. 

Another approach is to improve the discrimination between gamma-ray and 
cosmic-ray events through the use of advanced pattern recognition or multivariate 
analysis techniques. Some of the most successful approaches to this draw on devel- 
opments in the field of experimental particle physics, where similar problems of 
classification are often met. While many different techniques have been attempted 
(neural networks, genetic algorithms, etc.), the most efficient appear to be the deci- 
sion tree methods: boosted decision trees*® “” and random forests.*8 Inputs to these 
machine learning algorithms can correspond to the simple geometrical parameters 
of the standard analysis method, or encompass additional information, including 
the results of the template fitting methods described above. 

Finally, many attempts have been made to explore additional properties of 
the Cherenkov radiation from air showers, in the hope of finding complementary 
information to enhance the analysis. Some have failed — the spectrum*® or the 
polarization”? of the Cherenkov light, for example, do not seem likely to provide any 
useful additional discrimination. The arrival time of Cherenkov photons, however, 
does improve discrimination somewhat — an aspect of the analysis that becomes 
more important with the development of very large isochronous reflectors, and very 
fast (>1 GHz) sampling electronics.°! °3 


6. Concluding Remarks 


Atmospheric Cherenkov gamma-ray telescopes have proven remarkably successful 
over the past decade. Small arrays of moderately-sized telescopes have opened a new 
window on the Universe, probing particle acceleration in extreme environments both 
within and outside of our Galaxy. The next stage in the development of the technique 
requires substantial investment, and hence collaboration on a global scale. This 
is proceeding through the “Cherenkov Telescope Array” (CTA) project, which is 
designing and constructing a next generation instrument.°* The plan involves a km? 
array with a few large aperture (~23 m) telescopes at the center, surrounded by an 
array of moderately-sized telescopes (~10 m) with ~100 m spacing, supplemented 
by a wider-spaced array of smaller telescopes (4 m). A graded array such as this is 
expected to provide sensitivity improvements of an order of magnitude over current 
arrays, together with the widest possible energy coverage. Prototyping and testing 
is underway, and new technologies are being tested at all stages (e.g. in mirror 
designs, photosensors, and trigger and data acquisition systems). The goal of such 
development is not only to enhance the array performance, but also to deal with 
the necessities of mass production, low cost, and strict maintenance requirements. 
Both northern and southern hemisphere arrays are envisaged, and possible sites are 
currently under discussion. 
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At high energies (>100 GeV) the flux of gamma rays from astrophysical sources 
becomes sufficiently small that direct detection using balloon or space-based 
detectors becomes impractical. Above this energy, the regime of very-high-energy 
(VHE) astronomy, ground-based detectors are used. With ground-based detectors 
the atmosphere is an integral component of the detector. Interactions between the 
primary gamma ray (or cosmic ray) and the atmosphere lead to the production 
of an extensive air shower: a swarm of relativistic particles that traverses the 
atmosphere, first growing, then shrinking in particle number as it proceeds to the 
Earth’s surface. Ground-based gamma-ray astronomy utilizes two different detec- 
tion techniques. Atmospheric Cherenkov Telescopes detect the Cherenkov radia- 
tion generated by these particles as they traverse the atmosphere. By contrast, 
Extensive Air Shower (EAS) arrays detect the particles in the extensive air shower 
that reach the ground. EAS arrays operate continuously and simultaneously view 
the entire overhead sky. While Atmospheric Cherenkov Telescopes only operate 
on clear nights with limited Moon exposure, they have a higher instantaneous 
sensitivity to point-like sources. In this chapter, we discuss the detection of VHE 
gamma rays using EAS arrays, concentrating on detectors based on the water 
Cherenkov technique. 


Introduction 


Ground-based gamma-ray astronomy studies the universe in gamma rays with ener- 
gies above ~100 GeV. At these energies the radiation mechanisms are non-thermal; 
one is observing secondary radiation of accelerated particles. Accelerated electrons 
emit synchrotron radiation in a magnetic field and can scatter ambient photons to 
high energies via inverse Compton scattering. Accelerated protons can emit proton 
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synchrotron radiation in a magnetic field or generate gamma rays through the pro- 
duction of neutral pions as they interact with matter. These gamma rays interact 
with the radiation fields in interstellar and intergalactic space before reaching our 
atmosphere. Once these gamma rays reach our atmosphere they interact with the 
nuclei and generate an extensive air shower, a swarm of energetic particles (mostly 
electromagnetic in nature) that propagates to earth. One may either directly detect 
these secondary particles using extensive air shower (EAS) arrays or detect the 
Cherenkov radiation generated by these electromagnetic particles as they traverse 
the atmosphere using imaging Atmospheric Cherenkov Telescopes (ACTs).* 

EAS arrays have used a variety of detector technologies to detect the particles 
in the extensive air shower. Plastic scintillator has been a common choice of detector 
and has been used by the CYGNUS,! CASA,? and Tibet? air shower arrays. While 
this is a well-understood technology, it is relatively expensive. Arrays composed of 
plastic scintillator tend to be sparsely instrumented so that large areas (necessary 
to detect the extremely low flux of VHE gamma rays) can be enclosed. The cost 
also drives one to use scintillator that is significantly less than a radiation length 
thick (~41 cm for polystyrene), making them inefficient for the detection of gamma 
rays. To compensate for this drawback many EAS arrays place a radiation length of 
lead over the scintillation detectors. Rejection of the large background of hadronic 
cosmic rays requires the use of large-area muon detectors. The CASA array buried 
5000 m? of scintillator under 5 m of dirt, the CYGNUS array used an already existing 
neutrino detector with additional plastic scintillator buried in a cliff face, and the 
Tibet array does not currently have muon detectors in the array. 

The low cost of water and the large Cherenkov angle make water an excellent 
detection medium for extensive air showers. One can construct a dense array where 
the enclosed area is completely instrumented, the detector can be made sufficiently 
thick to convert the gamma rays in the air shower to electrons and positrons so 
their Cherenkov radiation can be detected, and the same detector can be made 
deep enough to become a muon detector. In this chapter we begin by discussing 
the basic properties of extensive air showers in Sec. 2, including the atmosphere 
(Sec. 2.1), an important component of an EAS detector, and specific properties 
of electromagnetic (Sec. 2.2) and hadronic (Sec. 2.3) air showers. We then discuss 
details of water Cherenkov detectors in Sec. 3 — the basic properties of Cherenkov 
radiation (Sec. 3.1), the optical properties of water (Sec. 3.2), and the reconstruction 
of extensive air showers (Sec. 3.3). 


2. Extensive Air Showers 


When a high-energy gamma ray or cosmic ray enters the atmosphere it interacts 
with the air molecules and loses energy through particle production, predominantly 


“See Chapter 6 of this volume. 
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the creation of electron—positron pairs in the case of gamma rays and the genera- 
tion of pions in the case of hadronic cosmic rays. Electrons and positrons (hereafter 
referred to as electrons) generate additional gamma rays through bremsstrahlung 
radiation. Neutral pions decay to gamma rays that subsequently form electron— 
positron pairs. Charged pions either decay into muons and neutrinos or interact 
with nuclei in the atmosphere, creating additional pions and nucleons. This pro- 
cess continues with the subsequent multiplication of the number of particles until 
the average energy per particle reaches the critical energy E. ~ 80 MeV. At this 
energy the cross section for particle production is smaller than that of ionization 
losses and the number of particles in the air shower decreases. The atmospheric 
depth at which the number of particles in the air shower peaks is known as shower 
maximum (or Xyax)- The development of the air shower is determined by the 
energy-dependent cross sections for the different possible interactions and the struc- 
ture of the atmosphere. In this section, we first discuss the composition and pro- 
file of the atmosphere and follow with a discussion of the general properties of 
extensive air showers important for their detection and reconstruction at ground 
level. 


2.1. The Atmosphere 


The interactions between the primary cosmic ray and the nuclei in the Earth’s atmo- 
sphere (and the subsequent interactions of the daughter particles) depend upon the 
composition and density profile of the atmosphere. The Earth’s atmosphere is com- 
posed of approximately 78.1% Nog, 20.9% Oz and 0.9% Ar. This yields an average 
atomic mass number of ~14.6 for the atmosphere. It is typical to use this average 
mass of “air” nuclei in models of the development of the atmospheric cascade. There 
are several models of the atmospheric profile available, many of which have been 
parameterized and incorporated in modern codes used to simulate the development 
of extensive air showers. One of the more common models used is the US Stan- 
dard Atmosphere developed by the National Oceanic and Atmospheric Administra- 
tion, the National Aeronautics and Space Administration, and the US Air Force.4 
This model has been parameterized in the CORSIKA® simulation of extensive air 
showers as a five-layer model. The first four layers are modeled as an exponential 
atmosphere: 


T(h) = a; + by x eh, (1) 
The fifth layer has a linear relation between altitude and overburden: 
T(h) = a5 — bs x h/cs. (2) 


Table 1 gives the values of the coefficients for this parameterization. Figure 1 shows 
the US Standard Atmosphere using the above parameterization. For a given model 
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Table 1. Coefficients for parameterization of the US Standard Atmosphere. 


Layer 7 Altitude h (km) a; (gcm~?) bi (gcm~?) ci (cm) 
1 0-4 —186.555305 1222.6562 994186.38 
2 4-10 —94.919 1144.9069 878153.33 
3 10-40 0.61289 1305.5948 636143.04 
4 40-100 0.0 540.1778 772170.16 
5 >100 0.01128292 1.0 10° 
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Fig. 1. The US Standard Atmosphere shown as mass overburden as a function of altitude. 


of the atmosphere the mass overburden is given by 


nN 


X(h) =f pla)ae, (3 
h 
where p(x) is the density of the atmosphere at altitude x. 


2.2. Properties of Gamma-Ray-Induced Extensive Air Showers 


Cascade equations describing the longitudinal development of a particle cascade 
were first developed by Rossi and Greisen.® Using these equations one can show 
that the longitudinal development of the air shower (the number of electrons as 
a function of atmospheric depth) is given by the following equation, known as 
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Fig. 2. The number of electrons in an air shower as a function of atmospheric depth (expressed 
as radiation lengths) from approximation B for several gamma-ray primary energies. 


Approximation B: 


oe 0.31 


3X 
— ep X | 1 Sle |__|», 4 
0 Win Eo/Ec pf ( comer) (4) 


where Eo is the energy of the primary gamma ray and X is the atmospheric depth 


N,(X) = 


expressed in radiation lengths (a radiation length in dry air at 1 atmosphere pressure 
is Xp = 36.62gcm~2)." Figure 2 shows the longitudinal development of air showers 
from primary gamma rays of several different energies. For a gamma-ray primary 
particle one has: 


X max ~ In(Eo/E-), (5) 


and 


ol 
Ne,max y a (6) 
In(Eo/Ez) — 0.33 


The quantity: 
3X 
s=- —_- 
X +21n Eo/E- 


3 
1+ 2(Xinex/X) 
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is known as the shower age, which is equal to 1 at shower maximum and increases 
as the shower progresses through the atmosphere. 

At ground level the lateral distribution of the charged particle density in an 
air shower is described by the NKG function, named for Nishimura, Kamata, and 
Greisen.*:9 


r= eT Roa) (!+Em) 


mol 


where Rmol = Bel (Xo /p) is the Moliere radius’ evaluated two radiation lengths 
above the observation altitude,? p is the density of air (evaluated two radiation 
lengths above the observation altitude), and s is the shower age. The Moliere radius 
is the radius that contains roughly 90% of the energy of an electromagnetic shower. 
Because the energy spectrum of the charged particles steepens as the distance from 
the shower core increases, the lateral distribution of the energy flow is 1/r times the 
charged particle density® given in Eq. (9). Figure 3 shows the lateral distribution 
of charged particle density for several values of the shower age. 

Equation (4) gives the number of charged particles as a function of atmospheric 
depth in an EAS. In addition to the electrons in a gamma-ray-induced air shower, 
there are a significant number of gamma rays. To study the particle composi- 
tion of an air shower, we utilize the CORSIKA Monte Carlo simulation package. 
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Fig. 3. The lateral distribution of the density of charged particles in an extensive air shower for 
several values of the shower age, as given by the NKG function. The figure was made for a Moliere 
radius of 100 meters. 
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Fig. 4. The ratio of gamma rays to charged particles in air showers generated by gamma rays with 
energies between 100 GeV and 100 TeV. The figure is from a CORSIKA simulation of gamma-rays 
for an observation altitude of 4100 masl. A minimum particle energy of 10 MeV was required. The 
average value of the photon:electron ratio is 4.75. Note that entries to this histogram were only 
made if there was one or more electron in the air shower. 


Figure 4 shows the ratio of gamma rays to electrons for air showers generated by 
gamma-ray primary particles with energies between 100GeV and 100 TeV (gener- 
ated on an E~?*3 spectrum) for an observation altitude of 4100m above sea level 
(asl). A 10-MeV threshold was applied to the electrons and gamma rays for this 
figure. Gamma rays outnumber charged particles by a ratio of 4.75:1; therefore, a 
detector that is sensitive to gamma rays will be more sensitive than a detector that 
is only sensitive to charged particles. 

The detection of the shower particles that reach the ground depends on the 
energy distribution of the particles. In Fig. 5 we show the energy distributions of 
electrons and gamma rays at 4100 m asl. The average electron energy at. this 
altitude is 74 MeV, very close to the critical energy. The gamma rays that reach the 
ground have an average energy of 29 MeV. These results only weakly depend on the 
observation altitude. 

The above discussion and formulae represent the average behavior of air showers 
generated by cosmic gamma rays. To understand the response of a ground-based 
detector one must investigate the fluctuations in the development of the air shower. 


bThis figure was made using the CORSIKA simulation package; gamma rays were generated with 
energies between 100 GeV and 10 TeV on an E~?-3 spectrum and electrons and gamma rays were 
followed down to an energy of 1 MeV. 
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Fig. 5. The energy distribution of gamma rays (black solid line) and electrons (black dashed 
line) at an observation altitude of 4100 m. The average electron energy is 74 MeV and the average 
gamma-ray energy is 29 MeV. The figure is from a CORSIKA simulation of gamma rays for an 
observation altitude of 4100 m above sea level. 


The main cause of shower fluctuations is the fluctuation in the depth of the first 
interaction. In addition to this, there are fluctuations in the development of the 
air shower. However, as the number of particles in the air shower increases, the 
fluctuations in the development of the air shower decrease. These fluctuations are 
important in understanding the response of an extensive air shower detector and 
in the ability of such a detector to measure the energy of the primary gamma- 
ray. The radiation length of a gamma ray in air (at STP) is 36.62gcm7?,” this 
is 7/9 of the mean free path of a gamma ray to create an electron—positron pair. 
The distribution of first interaction depths for high-energy gamma rays (where pair 
production dominates the energy loss) is then given by the probability function 
P(X) = Ze 5X , where X is the atmospheric depth expressed in radiation lengths. 
In Fig. 6 we show the distribution of first interaction depths and the number of 
electromagnetic particles (gamma rays and electrons) that survive to an altitude 
of 4100 masl as a function of the depth of the first interaction. The full width at 
half maximum of the distribution of first interaction heights is roughly 30gcm7?. 
Over this range of first interaction heights the average number of particles to reach 
a 4100-m observation altitude ranges from ~270 to ~670, a factor of 2.5. Thus, 
without a method to determine the height of the first interaction (or the height of 
shower maximum), the fluctuations in the development of the air shower will limit 
the energy resolution of an extensive air shower array to about 40% in the logio of 
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Fig. 6. The black line peaking at about 21 km shows the distribution of the depth of the first 
interaction for 500-GeV gamma rays (for this curve the y-axis is an arbitrary scale). The red line 
shows the number of electromagnetic particles surviving to an observation altitude of 4100 m as 
a function of the depth of the first interaction. The gamma rays were generated from the zenith. 
This figure was made using the CORSIKA simulation package. See electronic edition for a color 
version of this figure. 


the energy. The impact of shower fluctuations on the response of an extensive air 
shower array will be discussed in the next section. 


2.3. Properties of Hadronic Extensive Air Showers 


The discussion so far has dealt with the properties of extensive air showers generated 
by gamma rays. Any ground-based detector must of necessity detect cosmic gamma- 
rays in the presence of a large background of hadronic (H, He, and heavier nuclei) 
cosmic rays. There are two methods that have proved successful in reducing this 
background in extensive air shower detectors: angular resolution (which will be 
discussed later) and muon detection. While a small fraction of gamma-ray-induced 
air showers contain muons from photo production on nuclei in the atmosphere, 
hadronic interactions inevitably lead to the production of pions and kaons. While 
the neutral pions (with a lifetime of 7 = 8.52 x 107!"s) decay immediately to 
gamma rays, the charged pions (with a lifetime of 2.60 x 10~°s, cr = 7.8m) can 
either interact with nuclei or decay into muons and neutrinos. Kaons have cr ranging 
from 2.68cm for the Kg to 15.34m for the K? and 3.7m for the K*. Though kaons 
have a large number of possible decay modes, these modes predominantly include 
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muons or pions. Thus the identification of muons in extensive air showers can be 
used to reject the background from hadronic cosmic rays. 

The number of muons present in an air shower depends upon the primary 
energy and the minimum muon energy. While several parameterizations for the 
number of muons in air showers have been obtained, here we present the results 
of a CORSIKA Monte Carlo simulation of air showers generated by high-energy 
protons. Figure 7 shows the average number of muons produced with energies above 
1 GeV as a function of primary cosmic-ray energy. For comparison we show a fit to 
the simulated data of a functional form given in Gaisser!® (Eq. (16.7)): 


Eo 0.85 
Ni(siGev)=364(—— — 1 
ee a (ama) ee) 


where A is the atomic mass number of the primary cosmic ray and the three con- 
stants (3.6, 109, and 8.5) were fit to the simulated data. While Fig. 7 shows the 
muon number as a function of primary energy, the primary energy is not a direct 
observable in an air shower detector. Instead, one detects the number of electromag- 
netic particles that survive to the ground level. The relatively large fluctuations in 
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Fig. 7. The average number of muons in an extensive air shower as a function of primary energy. 
Proton primaries were generated from 100 GeV to 100 TeV and the observation altitude was 4100 m 
asl. The error bars represent the error on the mean number of muons. This figure was made using 
the CORSIKA simulation package. The fit (blue solid line) is Eq. (10). See electronic edition for 
a color version of this figure. 
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the shower size lead to large fluctuations in the detected muon number as a function 
of shower size. 

The lateral distribution of muons is flatter than that of electrons and has been 
parameterized as!!! 


1 dN, re 
— = C———,; 11 
N, dr Game (11) 


with a fixed at 1. We have found that allowing an additional degree of freedom 
with qa significantly improves the fit to the simulated muon lateral distribution. 
Figure 8 shows the muon lateral distribution from a CORISKA simulation of pro- 
ton primaries with energies between 100GeV and 100TeV generated on an E~?:7 
spectrum, along with a fit of the form given in Eq. (11), with the fit parameters 
ro = 136.3, a = —0.217, and @ = 2.05. 

The energy distribution of muons in proton-induced extensive air showers is 
given in Fig. 9. For this figure protons were generated on an E~?:" spectrum from 
the zenith. The observation altitude was 4100m asl. The average muon energy is 
6.3 GeV. 
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Fig. 8. The lateral distribution of muons in an extensive air shower. Proton primaries were 
generated from 100 GeV to 100 TeV and the observation altitude was 4100 m asl. The dashed blue 
line is a fit to the function Eq. (11) with fit parameters ro = 136.3, a = —0.217, and @ = 2.05. 
This figure was made using the CORSIKA simulation package. 
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Fig. 9. The energy distribution of muons in an extensive air shower. Proton primaries were 
generated from 100 GeV to 100 TeV on an E~?:7 spectrum. The observation altitude was 4100 m 
asl. This figure was made using the CORSIKA simulation package. 


3. Water Cherenkov Detectors 


The low relative cost of water makes possible the construction of thick detectors 
that densely sample the air shower. In this case one detects the Cherenkov radiation 
emitted by the relativistic charged particles. These detectors can be made thick 
enough to not only convert the abundant gamma rays in the air shower to electrons 
(which may then be detected via their Cherenkov emission) but also to shield the 
electromagnetic component of the air shower from a deep array of detectors that can 
be used to detect the muons in hadronic air showers. Water Cherenkov detectors 
have now demonstrated that they can achieve a significantly lower energy threshold 
than conventional scintillation detectors and efficiently reject a significant fraction 
of the cosmic-ray background.!? 1° 

The first use of water Cherenkov detectors to detect extensive air showers was 
by N. Porter at Harwell'® to investigate the electron and gamma-ray component 
of extensive air showers. The Haverah Park experiment!’ used an array of water 
Cherenkov detectors, similar in size and design to the Harwell detectors, to detect 
ultra-high-energy cosmic rays. A nice history of the use of water Cherenkov detectors 
in cosmic-ray research is given in Ref. 18. In ground-based gamma-ray astronomy 
there have been two detectors based on the water Cherenkov technique: Milagro, 
which operated from 2000-2008 in the Jemez Mountains of northern New Mexico, 
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and the HAWC detector, which began full operation in March 2015 on the slopes 
of Sierra Negra in Mexico. 

Milagro was at an elevation of 2630m above sea level and a latitude of 36° N. 
Milagro consisted of a 24-million-liter water reservoir, measuring 80m x 60m x 
7.5m deep, covered with a light-tight membrane surrounded by an array of 175 
outrigger tanks covering an area of ~40,000m?. The reservoir was instrumented 
with two layers of photomultiplier tubes (PMTs). The top layer, under ~1.5m of 
water, consisted of 450 PMTs on 2.8m centers. The bottom layer, under 6m of 
water, consisted of 273 PMTs, also on 2.8m centers. The sides of the reservoir were 
sloped so the bottom area was ~2500 m?. Figure 10 shows the main reservoir. Each 
outrigger was a cylindrical plastic tank 1m high and 3 meters in diameter. Each 
tank was outfitted with a single PMT mounted on the top looking down into the 
water volume, which was lined with reflective TyVek. The PMTs in the top layer 
of the reservoir and the outrigger tanks were used to reconstruct the direction and 
energy of the primary cosmic ray. The bottom layer of PMTs in the reservoir was 
used to deter muons and reject the cosmic-ray background. Details of the Milagro 
detector can be found in Ref. 19. Milagro had an energy threshold of ~2' TeV and its 
sensitivity was such that the Crab Nebula was detected at five standard deviations 
in six months.?? 

The High Altitude Water Cherenkov (HAWC) Observatory (see Fig. 11) is 
located on the Vélcan Sierra Negra in the state of Puebla, Mexico. The observatory 
is at an altitude of 4100m and a latitude of 19° N. Based upon the experience of 
Milagro, the HAWC design is significantly modified. HAWC comprises 300 large 
steel tanks, each containing four PMTs at a depth of ~4m below the water surface. 
The tanks measure 7.3m in diameter by 4.5m high. Three of the PMTs are reused 
from Milagro (8-inch Hamamatsu R5912), arranged in a triangle around the cen- 
ter of the tank; the fourth is a 10-inch Hamamatsu high quantum efficiency PMT 


Fig. 10. The main water reservoir of the Milagro Gamma-Ray Observatory. One can see the 
two layers of PMTs. For this photograph the cover was inflated to perform maintenance on the 
detector. Photo credit Rick Dingus. 
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Fig. 11. The HAWC Gamma-Ray Observatory at the Véclan Sierra Negra in the state of Puebla, 
Mexico. The elevation is 4100 m above sea level. Photo credit Alberto Caraminana, INAOE. 


(R7081 HQE) placed at the center of each tank. As with Milagro, the PMTs are 
upward facing. The individual tanks provide optical isolation between detector ele- 
ments, which improves the angular resolution and background rejection capabilities 
of the array, and the entire active area of HAWC acts as a muon detector. The 
combination of increased altitude, optical isolation, and five-fold increase in the 
muon detection area, results in an improvement in sensitivity of a factor of 10-15. 
HAWC will survey 8 sr of the sky to a level of ~3% of the flux from the Crab Nebula 
after five years of operation. 


3.1. Cherenkov Radiation 


Cherenkov radiation, first predicted by Oliver Heaviside in 1889,?! went undetected 
for nearly half a century before P. A. Cherenkov and S. I. Vavilov detected the 
Cherenkov emission arising from gamma rays interacting with a liquid.2”?? The 
modern theoretical understanding of Cherenkov light is due to I. M. Frank and 
I. Tamm:”4 it is the emission of coherent radiation when a charged particle travels 
through a medium faster than the speed of light in the medium. A particle with 
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velocity v will cause the medium to emit light in a cone with opening angle 
cos(0.) = — = — (12) 


where n is the index of refraction of the medium, if v > c/n. The number of photons 
generated (and their spectrum) is given by the Frank~Tamm formula’ 


d?N az? 


OF ta hes 2 = ee 
qEde ~ he 3 (0.) © 370sin*(@.(E)) eV cm™, (13) 


where a is the fine structure constant, z is the charge of the relativistic particle, E 
is the energy of the emitted photon, and 6¢(£) is the energy-dependent Cherenkov 
angle of the medium (due to the wavelength dependence of the index of refraction of 
the medium). In water (n 1.33) the Cherenkov angle is ~41.2°. This large opening 
angle guarantees a large detection efficiency for charged particles if the sensor depth 
is ~1/2 the spacing between sensors or larger. 

To understand the response of a water Cherenkov detector, the emission spec- 
trum of Cherenkov light must be convolved with both the spectral response of the 
light sensors and the spectral dependence of the absorption (and scattering) of light 
by the water. Figure 12 shows the emission spectrum of Cherenkov radiation in 
water, the quantum efficiency of a Hamamatsu R5912 photomultiplier tube (PMT) 
as a function of wavelength, and the product of these two curves, which is the 
number of photoelectrons (PEs) that would be detected with 100% photocathode 
coverage in the absence of absorption of light by the water. A rough estimate of 
the energy required to produce a photoelectron in a water Cherenkov detector is 
given by the integral under the curve in Fig. 12‘c) (~54 PEs/cm or ~27 PEs/MeV), 
multiplied by the fractional photocathode coverage of the detector. For example, 
if a detector deployed an array of Hamamatsu R5912 PMTs with the standard 
photocathode (20-cm diameter) on 3-m centers, it would require roughly 30 MeV of 
energy deposited to produce 1 PE in a PMT, which is approximately the average 
energy of the gamma rays in the air shower (see Fig. 5). 


3.2. Optical Properties of Water 


As the light propagates through a water Cherenkov detector, the intensity in the 
beam is attenuated by absorption and scattering processes. The beam intensity 
decreases as 


(A, 1) = Ipexp~/2), with (14) 


—— = — + ——_ (15) 
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Fig. 12. (a) The emission spectrum of Cherenkov radiation, (b) the quantum efficiency of a 
Hamamatsu R5912 PMT with a standard quantum efficiency (dashed curve) and a high quantum 
efficiency (solid curve) photocathode,?° and (c) the product of the emission spectrum and the 
PMT response for the standard photocathode (dashed curve), and for a high quantum efficiency 
photocathode (solid curve). 
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where Laps(A) and Lgcat(A) are the wavelength-dependent absorption and scattering 
lengths in the water. In pure water or water where the scattering centers are small 
compared to the wavelength of light, Rayleigh scattering dominates. Mie scatter- 
ing dominates for larger size particulates. While Rayleigh scattering is isotropic, 
Mie scattering is more complex, with both symmetric and asymmetric components. 
While attenuation lengths of ~100m have been achieved in large neutrino detec- 
tors,7° such water purity is not required for EAS arrays. From Eq. (15) one sees 
that attenuation lengths longer than ~2 times the detector depth reduce the light 
collected by less than 15%. If the attenuation is dominated by small-angle scat- 
tering (Mie scattering), the impact on the detector response will be even smaller. 
The Milagro detector achieved attenuation lengths near 20m at 350nm,'® 
standard water filtration techniques: a water softener, 1-m filters to remove par- 
ticulates, a UV lamp to kill microorganisms, and charcoal bedding to remove organic 
compounds that tend to absorb light. 


using 


3.3. Reconstruction of Extensive Air Showers 


The reconstruction of an extensive air shower uses the measured information on 
the ground, which is composed of spatial-temporal measurements of the particles 
that reach the ground, to infer properties of the primary gamma ray or cosmic 
ray that initiated the air shower. The important parameters of the primary parti- 
cle are its direction, energy, and nature (gamma ray, proton, or heavier nuclear 
species). The reduction of the cosmic-ray backgrounds depends upon accurate 
reconstruction of the direction of the primary particle and the determination of 
its nature. As an example consider the Crab Nebula and an array with an effec- 


tive area of 20,000m?. The rate of cosmic-ray protons is 1.422 x 10° ha 3) protons 


m“sr~ts~!. Cosmic-ray protons account for roughly 60% of the trigger rate of 
an air shower array, with heavier cosmic rays accounting for the remaining 40%.?° 
Therefore, ~2 cosmic-ray background events with energies above 1 TeV will fall 
in an angular bin with radius of 1° centered on the Crab Nebula each second. In 
that same angular bin the signal rate from the Crab Nebula will be 0.0042 Hz 
above 1 TeV.” 

Good background rejection, through angular resolution and/or determination 
of the nature of the primary particle (gamma ray or cosmic ray), is critical to the 
success of an extensive air shower detector. For a given source of gamma rays, if the 
number of background events is large such that Gaussian statistics are valid, one 
can express the statistical significance of an observation as 


N, = —Nsignal__ (16) 
V Npbackground 
, . AY . 


VAQ SFE) f1)(B)Alg(B) dB 
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The first term shows how the significance grows with the observation time T. The 
second term shows the importance of the angular resolution. Here AQ is the solid 
angle of the observation angular bin, and e(AQ) is the fraction of gamma-ray events 
that are reconstructed within that bin.©° The final term shows the importance 
of background rejection. Igig(E) is the source flux (with units GeV~'cm~?s7'), 
A’, (£) is the effective area (see below) of the detector for point sources as a func- 
tion of gamma-ray energy (with units cm?) after particle nature cuts are applied. 
I;(E) is the cosmic-ray background flux of species j (with units GeV~! cm~? sr! 
s~') and A!,,(E) is the effective area of the detector to diffuse cosmic rays of species 
j (with units cm?) after particle nature cuts are applied. From Eq. (17) one can 
see that maximizing the significance entails maximizing the ratio of the number of 
signal events retained to the square root of background retained. This is true for 
both the angular cut and any cuts on the particle nature. 

The effective area of a detector is a measure of the area over which events are 
accepted (including both detector trigger requirements and analysis selections). It 
is determined through Monte Carlo simulations and defined as 


Nee (E, cD) 


Aor (E,@) = Neon (E, 8) 
gen\£, 


x Agen(E, 6), (18) 
where 6 is the zenith angle of the primary particle, Agen is the area over which the 
primary gamma rays and cosmic rays are simulated (this area must be sufficiently 
large that the result is independent of Agen), Ngen is the number of primary par- 
ticles generated at each energy, and Ngei(Z) is the number of generated primary 
particles that survive all of the selection cuts (trigger criteria, angular reconstruc- 
tion, background rejection cuts, etc.). At higher energies, the detector can trigger 
on events with cores well beyond the physical area of the array, so the effective 
area can be larger than the array size. At low energies, it is the fluctuations in 
the shower development that determine the efficiency of the detector. The effective 
area is dependent upon the density of the detector elements, the particles (gamma 
rays and/or electrons) that the detector is sensitive to, and the altitude of the 
detector (or the atmospheric overburden). Because of the latter, the effective area 
is dependent upon the zenith angle of the primary particle. 


3.3.1. Angular Reconstruction 


If the air shower were a simple plane when it arrived at ground level, determining 
the direction of the primary cosmic ray would simply be a matter of measuring the 
arrival time as a function of position on the ground and determining the normal to 
that plane. In fact, the air shower deviates from a plane, with the particle arrival 
times being larger as the distance from the shower core increases. (Since particles 


°For a detector with a Gaussian angular resolution, og, the opening angle of the optimal bin is 
1.58 - a9 and e(AQ) is 0.72, and then «(AQ)/VAQ = 0.26/a978. 
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farther from the core underwent more multiple scattering, they traveled farther and 
thus they arrive at the ground later.) This is known as curvature of the shower front. 
Near the core (~100 meters), the shape is parabolic. The first step in determining 
the direction of the primary cosmic ray is to determine the location of the core of 
the air shower. In a scintillation detector this can be accomplished by fitting the 
lateral distribution of particles on the ground to the NKG function given in Eq. (9). 
As discussed in Sec. 2.2, the lateral distribution of the energy flow is the NKG 
function divided by R. Since a water Cherenkov detector is a thick detector, it is 
an electromagnetic calorimeter, so the core location is better determined by fitting 
to the lateral distribution of the energy flow. 

After determining the location of the shower core one can correct the arrival 


times for the curvature — a method used in previous experiments. However, as we 
show below, the distribution of particle arrival times at a given core distance is 
asymmetric and non-Gaussian. Therefore, a better approach is to use a maximum 
likelihood reconstruction algorithm using the actual probability density functions 
of the particle arrival times as a function of core distance. Figure 13 shows the aver- 
age electromagnetic particle (e+,e~,) arrival time as a function of core distance 
for gamma-ray-induced air showers (no requirement was made on the particle’s 
energy). These data are well fit by a parabola with a linear term of 0.198 ns/m and 
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Fig. 13. The average arrival time of electromagnetic particles in the air shower as a function of 
distance from the shower core. The dashed blue line is a fit to a quadratic function with linear 
term of 0.198ns/m and quadratic term of 5.59 x 1074ns/m?. This figure was made using the 
CORSIKA simulation package. Gamma-ray showers between 100 GeV and 100 TeV were generated 
from zenith. These data are averaged over 200,000 simulated air showers. 
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Fig. 14. The arrival time distributions of electromagnetic particles in the air shower for 10 radial 
bins. The black curve is for particles within 0-10 m of the shower core, the blue curve for particles 
within 10-20m from the shower core, etc. This figure was made using the CORSIKA simulation 
package. Gamma-ray showers between 100 GeV and 100 TeV were generated from zenith. These 
data are averaged over 200,000 simulated air showers. See electronic edition for a color version of 
this figure. 


a quadratic term of 5.59 x 107-4ns/m? (drawn on the figure). This average slope 
of 25ns over 100m corresponds to an angle of ~4°, significantly larger than the 
achievable angular resolution of an EAS array. 

In addition to the curvature (or delay) of the shower front, the temporal dis- 
tribution of the particle arrival times about this average delay changes. In general, 
the distribution of arrival times is broader as the distance from the shower core 
increases. Figure 14 shows the distribution of particle arrival times for 10 radial 
bins (0-10 m, 10-20 m,..., 90-100 m). 

In practice, the arrival time distributions shown here are good starting points for 
event reconstruction, but the actual distributions depend upon the particle energies 
and therefore on the details of the instrument, which can be understood through 
a complete simulation of the detector using, for example, the GEANT4?® simula- 
tion package from CERN. The Milagro water Cherenkov detector achieved angular 
resolutions of 0.35° for the largest events?° and the HAWC detector should achieve 
angular resolution of ~0.1° for the largest events.°° 
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3.3.2. Rejection of the Cosmic-Ray Background 


To date, the rejection of the cosmic-ray background in EAS arrays has been based 
primarily on the identification of muons in hadronic showers.?:!* While other 
discriminators, such as differences in the electromagnetic structure of EAS induced 
by gamma rays and hadrons,*! have been proposed (and may be effective in future 
experiments) they have yet to be demonstrated with an astrophysical source of 
gamma rays. 

In the Milagro detector, the central reservoir had two layers of PMTs. The 
deeper layer was under 6m of water, which effectively absorbed the electromagnetic 
component of the air showers. Since there was no optical barrier between the two 
layers of PMTs, purely electromagnetic showers would illuminate the bottom layer 
of PMTs, but the light intensity would be relatively low without steep gradients. 
In contrast, the Cherenkov cone from a muon that penetrated through the 6m of 
water, which required a minimum energy of 1.5GeV, would brightly illuminate a 
few neighboring PMTs. Figure 15 shows several simulated gamma-ray and proton 
events taken with the Milagro detector as they appeared in the bottom layer of 
PMTs. The gamma-ray and proton event pairs [(a, d) (b, e) and (c, f)] were selected 
to have similar characteristics in the top layer of the detector (core position and 
number of top layer PMTs illuminated). Milagro developed a parameter based on 
the event topology, labeled “compactness”, to distinguish gamma-ray events from 
hadronic events. Compactness was defined as the number of PMTs in the top layer 
with 2 or more PEs, divided by the maximum pulse height recorded by any single 
PMT in the bottom layer (PE Maz). The efficacy of this technique was strongly 
dependent upon the location of the core of the air shower. As detailed in Ref. 12 
the method was significantly more effective if the shower core was not within the 
central reservoir. The existence of high-energy electromagnetic particles near the 
core of gamma-ray-induced air showers, which could penetrate a significant depth 
of the water, caused these electromagnetic events to have a similar topology to that 
of the hadronic events shown in Fig. 15. 

In the HAWC detector a single layer of PMTs under ~4m of water is used 
to both reconstruct the direction of the primary particle and reject the cosmic-ray 
background. Thus, the entire detector area acts as a muon detector. The signifi- 
cantly increased muon detection area enables HAWC to exclude the region near the 
shower core from the background rejection algorithm. This significantly improves 
the efficacy of the cut at low energies. The effect of this is demonstrated in Fig. 16. 
This figure shows the effective areas of the HAWC and Milagro detectors. For each 
detector, two curves are drawn. The solid lines indicate the effective area after 
the event trigger and angular reconstruction (requiring the gamma-ray events be 
reconstructed within 0.75° of their true direction). The dashed curves show the 
effective areas after a background rejection cut based on compactness is applied 
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Fig. 15. The pulse height distribution of PMTs illuminated in the bottom layer of the Milagro 
detector for simulated gamma rays and simulated proton events. Panels (a)—(c) are gamma-ray 
events and panels (d)—(f) are proton events. The figure is from Ref. 12. 


to the data. For HAWC, the PMTs within 40m of the reconstructed shower core 
are excluded from consideration in the determination of PE Maz. One can clearly 
see the effect of high-energy particles near the core of electromagnetic air showers 
in the Milagro curves. At low energies, the majority of events that triggered the 
detector had their cores in the reservoir. Therefore most of these events were identi- 
fied as having hadron-like features in the bottom layer of the detector, significantly 
reducing the effective area of Milagro to low-energy gamma rays if the compactness 
cut was applied. In HAWC, where one can exclude the region near the shower core 
from consideration, this effect is absent and the effective areas before and after the 
compactness cut is applied are similar. (Note that the area of the excluded region in 
HAWC is comparable to the size of the muon detection area in Milagro, ~2500 m?, 
so this technique could not be applied to the Milagro data.) 
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Fig. 16. The effective areas of Milagro (red lines) and HAWC (blue lines). The solid lines are for 
events that trigger the detectors and are reconstructed within 0.8° (HAWC) and 1.2° (Milagro) of 
their true direction. The dashed lines have an additional compactness cut applied to the data. This 
figure was created by Brian Baughman at the University of Maryland using a CORSIKA simulation 
of the air shower and a GEANT4-based simulation of the detector. See electronic edition for a 
color version of this figure. 


4. Conclusions 


The use of water Cherenkov technology in ground-based gamma-ray detectors has 
led to the development of detectors with nearly two orders of magnitude better sensi- 
tivity than detectors based on plastic scintillators. Over 60 years after the discovery 
of extensive air showers, this development led to the detection of an astrophysical 
source of gamma rays (the Crab Nebula) using an extensive air shower detector. 
Milagro demonstrated that one could build a VHE gamma-ray detector capable of 
continuously monitoring a large fraction of the VHE sky, with sufficient sensitivity 
to discover new sources of TeV gamma rays. As this manuscript is being written, 
HAWC has been operating for roughly one year. While it is difficult to predict the 
outcome of HAWC observations, it is clear that the combination of HAWC and 
imaging atmospheric Cherenkov telescopes (VERITAS, H.E.S.S., and MAGIC) will 
make exciting discoveries within our Galaxy and beyond. 
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Great strides in neutrino detection technology have engendered the dawning of 
neutrino astronomy. Neutrino detectors have grown in size and sophistication to 
measure these elusive particles over nearly ten orders of magnitude in energy from 
the Sun, from supernova SN1987A and, most recently, from as-yet-unidentified 
cosmological sources. We provide an overview of the technologies used to detect 
neutrinos and the discoveries made by these varied and unusual devices. We also 
provide a short summary of the status quo of neutrino astronomical measurements. 


1. Introduction 


The first detection of solar neutrinos in the Homestake experiment! heralded the 
birth of neutrino astronomy. For the next two decades, the Sun remained the only 
detected astrophysical source of neutrinos until the fortuitous discovery of a burst 
of low energy neutrinos from SN1987A. Since then, increasingly large detectors 
have been built, anticipating the detection of neutrinos emitted by known energetic 
astrophysical sources such as gamma-ray bursts and active galactic nuclei, and pos- 
sibly by annihilating or decaying weakly-interacting massive particle (WIMP) dark 
matter. An important milestone on this path was reached with the discovery of high 
energy astrophysical neutrinos by the IceCube Neutrino Observatory in 2013.? 
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Neutrino interactions with matter occur principally via the weak force, render- 
ing neutrinos uniquely capable of conveying information from matter-dense regions 
that photons and charged nuclei cannot readily escape, such as stellar cores or 
the environments surrounding many highly energetic astrophysical objects. On the 
other hand, neutrinos’ weak interactions also make them exceedingly difficult to 
detect compared to other astrophysical messengers like optical photons. Modern 
neutrino detectors compensate for this by instrumenting very large fiducial volumes 
to increase the probability of a contained or partially contained interaction, even 
at energies well above the TeV (10!7 eV) scale. The volumes must be comprised of 
a high-transparency material, typically H2O, allowing a fairly sparse and relatively 
inexpensive sensor array to have a suitably high neutrino detection efficiency. 

To perform astronomical measurements, neutrino detectors must also be capable 
of rejecting backgrounds from atmospheric muons and neutrinos produced in cosmic- 
ray collisions with the Earth’s atmosphere. For the largest neutrino detectors, these 
backgrounds outnumber the astrophysical neutrino signal by factors of roughly 10° 
and 10°, respectively (see Figs. 1(a) and 1/b)). Effective background rejection is 
achieved through a combination of a thick natural overburden of rock, water or ice; 
accurate direction and energy reconstructions; and customized vetoing and tagging 
techniques. 
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Fig. 1. 


(a) 


(b) 


Intensity of (a) atmospheric muons vs. depth in water,? showing how increased overbur- 


den effectively decreases background contamination and (b) energy-weighted flux of atmospheric 
neutrinos vs. energy,* showing the steeply falling neutrino flux with energy for both electron and 
muon neutrinos. (Figure 1‘a) used with permission.) 
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2. First and Second Generation Astrophysical Neutrino Detectors 


The first generation of astrophysical neutrino detectors (Homestake,' Gallex® and 
SAGE®) used radiochemical techniques to identify solar electron neutrino (v,) inter- 
actions, counting the number of v,-induced radioisotopes of Cl or Ge, effectively 
integrating over long detection times. Although they were unable to reconstruct neu- 
trino arrival time and direction, they were able to demonstrate the neutrinos’ solar 
provenance by comparing the detected signal with theoretical models of the Sun.” 
The second generation of neutrino detectors used optical Cherenkov radiation®:? in 
water and heavy water to reconstruct neutrino direction and energy in real time, and 
had some ability to distinguish among the three neutrino flavors (1,., v,, v;) on an 
event-to-event basis.‘° +4 They measured solar and/or atmospheric neutrinos over 
a neutrino energy range of a few MeV to about 1 GeV. The Kamiokande, Baksan 
and IMB neutrino experiments were operational during SN1987A and detected the 
burst of roughly 20 MeV neutrinos emitted by the supernova. 

The second generation detectors typically consisted of a multi-kton fiducial mass 
with thousands of photomultiplier tubes (PMTs; see Fig. 2(a)), each sensitive to 
one or more photons emitted as Cherenkov radiation by charged particles produced 
in neutrino interactions. Typical neutrino interactions deposit photons in many 


(b) 


Fig. 2. (a) The Sudbury Neutrino Observatory cavity with the completed spherical array of 
inward-looking phototubes, prior to filling the cavity with water and inner sphere with heavy water 
and (b) a candidate neutrino event in SNO, showing the circular projection of the Cherenkov cone 
on the inner surface of the detector. (Images courtesy of SNO.) 
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PMTs, creating a predictable and measurable pattern from the imprint of the conical 
Cherenkov radiation on the sensors, an example of which is shown in Fig. 2(b). Using 
the known PMT positions and the detected photon arrival times, the direction, 
energy, interaction time and flavor of the event could be determined. 


3. Third Generation Astrophysical Neutrino Detectors 


The third generation of neutrino detectors use either optical or radio Cherenkov 
radiation in water and/or ice to achieve gigaton- or teraton-scale fiducial masses 
with real-time energy and direction reconstruction. These detectors focus on mea- 
suring much rarer and considerably more energetic (above roughly 1 TeV) astrophys- 
ical neutrinos. The size scale renders impractical the use of man-made containment 
structures and purification systems, so the devices instead place instrumentation 
in naturally-occurring media with very low absorption and scattering of Cherenkov 
light, such as deep sea water, deep lakes, or glacial ice.!° 7? 

Diagrams of two operating large-scale neutrino detectors are shown in Fig. 3. 
The grand scale of these detectors is required to compensate for the rarity of the 
signal and to help ensure event containment for better direction and energy recon- 
struction. The low scattering and absorption of the medium permits the use of 
low-density instrumentation to minimize costs while maintaining adequate light or 
radio wave collection for event energy and direction reconstruction. Although radio 


20,22 in what follows we 


Cherenkov neutrino telescopes have made great progress, 
will focus on detectors that use optical Cherenkov light since only that method has 


detected astrophysical neutrinos thus far. 


4. Neutrino Interactions in Matter 


Neutrinos passing through ordinary matter may undergo charged current (CC) or 
neutral current (NC) interactions with the electrons or nucleons in the detector 
fiducial volume. The cross section depends on neutrino energy and flavor, and is 
different for neutrinos and anti-neutrinos. For first and second generation exper- 
iments, the reader is referred to the literature for neutrino cross section details, 
which can be quite complex at neutrino energies below about 10 GeV.?? At higher 
energies, the cross section is dominated by the process of deep inelastic scattering 
and increases but remains small, as shown in Fig. 4. 

In the NC interaction, the neutrino emits a Z° boson, depositing a fraction of 
its energy in the form of a hadronic shower. The original neutrino emerges from 
the interaction, carrying away the balance of its energy as it exits the detector 
fiducial volume. Meanwhile, particles in the shower produce detectable light, and 
since most of these particles are short-lived or interact strongly with matter, the 
resulting signature in large-scale detectors is, to a good approximation, an expanding 
sphere of light. In the CC interaction, the neutrino emits a W~, creating a hadronic 
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Fig. 3. Schematics of the operating large-scale (a) ANTARES and (b) IceCube!® neutrino detec- 
tors. ANTARES consists of 900 modules attached to 12 lines anchored to the Mediterranean Sea 
floor 15 km off the coast of France. IceCube consists of 5160 modules attached to 86 strings frozen 
into the ice at the South Pole. Each detector has a maximum depth of about 2500 m. (Figure 3‘a) 
courtesy of the ANTARES Collaboration.) 
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Fig. 4. Neutrino cross sections at (a) lower energies from experimental data and (b) higher 
energies from theoretical expectations.?? There is more complexity in the cross section at neutrino 
energies below about 10 GeV.?° (Figures used with permission.) 


shower and transforming the neutrino into its charged lepton partner (e, p, or T). 
This results in a richer set of signatures in the detector. 

In smaller 1-10 kton-scale detectors like SNO and Super-K, the characteris- 
tic pattern of the projection of the Cherenkov light on the inward-facing PMTs 
depends on neutrino energy, flavor and interaction type. Generally speaking, the 
pattern depends on the number and interaction length of the daughter particles 
that emit Cherenkov radiation. For instance, electrons scatter more than muons, so 
the pattern is “fuzzier” for electrons. 

In larger detectors like ANTARES and IceCube, a vy. CC interaction’s daughter 
electron deposits its energy in effectively the same region as the hadronic shower 
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produced by the W~, so the signature is an expanding sphere of light. For v,, CC 
interactions, the daughter ju lepton is relatively long-lived and deposits its energy 
along its path as it moves through the detector, and the resulting signature is a 
linear deposition of light or a “track.” The track can also be accompanied by a 
detectable shower at the initial interaction vertex. For v, CC interactions, which 
can only occur above Tf creation threshold (E,. > 3.5 GeV) the daughter 7 is short- 
lived and has a wide array of decay modes. Its signatures depend on the initial 
vy, energy and the granularity of the detector, and can range from an expanding 
ball of light, to a track, to a spectacular “double bang” 74 
distinguishable from one another and connected by a relatively dim track. 


in which two showers are 


5. Optical Cherenkov Detectors 


The charged particles produced in a high energy neutrino interaction can move faster 
than the speed of light in the interaction medium, producing optical Cherenkov radi- 
ation® that can be detected using photomultiplier tubes. The Cherenkov spectrum 
is shown in Fig. 5‘a).* To determine the detector response to this spectrum, it must 
be convolved with the acceptance of the detector elements, such as the glass in the 
PMT. The impact of the detector elements for the IceCube detector is shown in 
Fig; 6(b)."" 

To reconstruct the data these detectors produce one must model the propa- 
gation of Cherenkov photons through the medium. The optical properties of the 
medium are mapped out using calibration sources designed specifically for this pur- 
pose. They can also be studied using acquired data such as those from downward- 
going cosmic-ray muons. Experiments with order of 1-10 ktons of purified media, 
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Fig. 5. (a) Cherenkov spectrum (solid line) and the Cherenkov spectrum convolved with the 
response of the IceCube detector?® (dashed line). (b) Enlarged view of the convolved spectrum. 


“Figure courtesy D. Chirkin, IceCube Collaboration. 
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such as Super-Kamiokande (Super-K) and SNO, have the advantage of being in 
full control of their detection medium, and they have used this flexibility to insert 
and remove calibration sources as needed to calibrate optical properties, energy 
response and directionality. They have used a diverse array of sources, ranging from 
isotropized laser light?’ to various radioactive sources”®:?° to a dedicated electron 
accelerator.°° 

The calibrations for experiments with order of Mton-Gton of instrumented vol- 
ume, such as IceCube, rely on inaccessible light sources and position sensors co- 
deployed with the sensor modules as well as downward-going cosmic-ray muons. 
Ocean-based experiments also use sonar to calibrate the position of each module 
in real time as ocean currents flow through and around the detector.*! The light 
sources enable calibration of the optical properties of the medium, which can vary 
with position and (in bodies of water) with time. They also provide a means for 
calibrating the energy response of the detector, especially for events that produce 
hadronic and/or electromagnetic showers. Downward-going muons mainly provide 
a way to calibrate the directional resolution of the detector?” and can also be used 
to calibrate the position and relative efficiency of each module. 

As a concrete example, consider IceCube and its detection medium, the glacial 
ice at the South Pole. The optical properties of the ice have been studied with 
a variety of devices and techniques. During the hot-water drilling of the holes, an 
instrument that mapped out the optical absorption to roughly meter-scale precision 
was used.?3 After the modules were deployed, and the melted column of ice in the 
hole had refrozen, a group of remotely-controlled LEDs in each module provide 
illumination at varying intensity levels and directions. The light is detectable by 
neighboring modules. IceCube thus has a system consisting of thousands of emit- 
ters and receivers at known locations, and has used the pattern of detected light to 
map out the properties of the ice in three dimensions, as shown in Fig. 6.34 These 
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Fig. 6. Effective scattering length (left) and absorptivity (right) in the South Pole icecap at 
depths relevant for IceCube.34 The sensor spacing in large-scale detectors is constrained to be not 
much larger than these distances. 
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properties are then used extensively in event simulation and reconstruction. The 
ANTARES ocean-based neutrino detector has used a similar technique to under- 
stand the optical properties of its constituent seawater.®° 


6. Cherenkov Light Detection 


As the Cherenkov photons created by the neutrino interaction propagate through 
the medium, the array of sensors is used to determine the time of arrival and inten- 
sity of the small fraction of photons that are detected. In a typical charged-current 
v,, interaction at an energy of 1 TeV in IceCube, about 10° photons are created and 
about 20 photons are detected. For accurate event reconstruction, the relative arrival 
times must be known to within roughly 5 ns across the km-scale distance occupied 
by the modules; the IceCube and ANTARES detectors achieve slightly better timing 
resolution than this. Regular, detector-wide timing calibrations are performed*® to 


37,38 are made 


meet this requirement. Follow-up observations by other observatories 
possible by latching the local time to GPS and transmitting summary data in real 
time to partner follow-up observatories (e.g. the Swift®? satellite) and centralized 
distribution points such as AMON.?° 

At the heart of each sensor is one or more photomultiplier tubes (PMTs) 
which are sensitive chiefly to blue and UV light and are thus well-matched to the 
Cherenkov spectrum shown earlier in Fig. 5. Operating optical Cherenkov neu- 
trino detectors all use relatively large, hand-blown PMTs (ranging from roughly 
20-50 cm in diameter), and for deep water or ice detectors, these PMTs are housed 
in a pressure vessel along with high voltage and data acquisition electronics. An 
example of a sensor housing a large PMT is shown in Fig. 7‘a). More recently, the 
availability of smaller mass-produced PMTs at favorable pricing have encouraged 
the development of modules housing dozens of small (ca. 7 cm) PMTs. The first 
such unit was developed by KM3NeT*!:4? (see Fig. 7/b)) and provides not only 
more photocathode area per unit cost, but also better directionality for individual 
photons by virtue of the effective segmentation of its photocathode area over 47 
steradians. Figure 7/b) shows an updated version installed in IceCube. 

Data acquisition electronics associated with each channel extracts and transmits 
for further processing digital information from modules containing signal due to one 
or more Cherenkov photons. The arrival time(s) and amplitude(s) of the signals are 
determined and, if a predefined trigger condition is met (e.g. more than N modules 
with potential signal in a sufficiently short-time window), signals are collated from 
the entire detector into an “event.” Events are then further processed downstream 
by conventional CPUs to determine the type, direction and energy of the candi- 
date neutrino interaction. For sparsely-instrumented detectors like ANTARES and 
IceCube, the directional resolution for CC vy, events is typically about 0.1°-1.0°, 
leveraging the long muon track in the detector. The showers produced by other neu- 
trino interactions have considerably worse directional resolutions of about 10 — 25°, 
depending on neutrino energy. The energy resolution for CC v, events depends on 
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Fig. 7. (a) A single-PMT light sensor with associated electronics, enclosed by a glass pressure 
sphere and shown with mechanical and electrical connectors (figure courtesy IceCube Collabo- 
ration) and (b) a multi-PMT sensor with 31 smaller PMTs, enclosed by a glass pressure sphere 
(figure courtesy KM3NeT). 


a variety of factors, such as whether or not the interaction vertex is contained in 
the detector, and whether or not the muon itself exits the detector volume. In most 
cases the energy assigned to the parent neutrino is quoted as a lower limit. For 
neutrino interactions that produce showers, if the showers are well-contained in the 
detector fiducial volume the visible energy can be reconstructed within 15% for 
E, > 10 TeV.*® This is sufficient to distinguish lower-energy atmospheric neutrino 
backgrounds from higher-energy cosmological neutrino signals. 

The better pointing of track-like CC v, events makes them the preferred vehi- 
cle for astronomical follow-up efforts. Background from atmospheric vy, events is 
reduced by requiring sufficiently large energy deposition and no accompanying atmo- 
spheric cosmic-ray muon(s). Although the pointing resolution of neutrino-induced 
showers is too large for most follow-up efforts, the arrival time may be used to search 
for coincident signals in large field-of-view instruments like the HAWC gamma-ray 
observatory.*4 


7. Summary of Results by Neutrino Detectors 


7.1. The Sun 


In 1956, Frederick Reines and Clyde Cowan observed, for the first time, electron 
anti-neutrinos produced by nuclear reactors.*° This led to interest in the use of neu- 
trinos for astronomy. Contemporaneous with Reines and Cowan’s efforts, Raymond 
Davis, using a different detection technique, failed to observe electron neutrinos from 
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nuclear reactors. This led to an understanding that neutrinos and anti-neutrinos 
interact differently with matter and that nuclear reactors produce almost exclusively 
electron anti-neutrinos. In close collaboration with theorist and creator of the Stan- 
dard Solar Model (SSM), John Bahcall, Davis increased the size of his experiment 
and operated it deep underground to shield the detector from cosmic rays.! Davis 
made the initial proposal for solar neutrino detection simultaneously with theoret- 
ical calculations by Bahcall in the same journal.” 46 

The neutrino—matter interaction cross section is so small that the detection rate 
was one neutrino every 2.3 days. However, it was soon observed that the detection 
rate was a factor of three lower than the SSM prediction. Many in the commu- 
nity were skeptical about the meaning of this discrepancy, preferring explanations 
involving the detector and/or the SSM. Neutrino oscillations, predicted by Bruno 


47 were revived as an explanation of solar neutrinos.4® Corroboration of 


Pontecorvo, 
the solar electron neutrino deficit by Kamiokande!® and, in 2002, the observation 
of solar neutrino oscillations by SNO,'? validated Davis and Bahcall’s work. 

In the Kamiokande, Super-K and SNO water Cherenkov detectors, the common 


interaction mode for electron neutrinos is electron scattering: 
Vete >uUete. (1) 


Electron scattering is also sensitive to muon and tau neutrinos, but with a cross- 
section lower by roughly a factor of eight. This process provides for a very modest 
correlation between the initial neutrino and the outgoing electron direction. Figure 8 
demonstrates that even with this modest correlation, events detected by Super-K 
manifestly come from the direction of the Sun. 
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Fig. 8. Observed scattered electron neutrino candidates by Super-K with respect to the Sun’s 
direction. Events are described by a mixture of solar signal (dark histogram) and background (light 
histogram).*9 (Figure used with permission.) 


174 D. F. Cowen and I. Taboada 


In contrast to H2O-based Kamiokande and Super-K, SNO used heavy water 
(D20), giving it sensitivity to neutrinos via additional interaction channels. Besides 
electron scattering, SNO was able to detect electron neutrinos exclusively via the 
charged current interaction: 


Ve + 7H > e” +ptp. (2) 


In agreement with Homestake, Kamiokande and Super-K, SNO showed a deficit in 
channels sensitive to electron neutrinos. In addition, SNO was able to detect all 
neutrino flavors via the neutral current channel: 


Vy + 7H > p+nt+rz. (3) 


None of the daughter particles produce Cherenkov light directly in the process 
shown in Eq. (3), but neutrons are captured by deuterons in heavy water and 
the resulting tritium gamma-decays with a 6.8-MeV photon. Compton scattering 
of these gamma rays produced electrons that were detectable by SNO. SNO also 
operated in alternative configurations that enhanced its neutron capture efficiency. 

The neutral current channel measurement (with and without neutron capture 
enhancements methods) agreed with the full neutrino flux predicted with the SSM.'? 
For solar neutrinos, this demonstrated that a fraction of electron neutrinos produced 
by the Sun were arriving at Earth as muon or tau neutrinos. Coupled with other 
measurements by Super-K using atmospheric neutrinos, this astrophysical measure- 
ment provided compelling evidence for neutrino oscillations. 

Observations of the Sun with neutrinos also validated the SSM and buttressed 
our understanding of main sequence stars. With neutrinos, observations of individ- 
ual components of the pp-chain were made, including the first step of proton—proton 
fusion, the contribution of the CNO-cycle to the energy budget of the Sun was con- 
strained to 1% or less, and the temperature at the core of the Sun was directly 
measured. 


7.2. Supernova SN1987A 


Core collapse supernovae (CCSN) are copious sources of neutrinos. The energy 
budget for neutrinos emitted in a few seconds is two orders of magnitude larger than 
the electromagnetic emission over a few weeks. Neutrinos play an important role 
in the energy transfer from the collapsing core to the outer envelope that is blown 
away in the explosion. In 1987, supernova SN1987A exploded 51.4 kpc away in the 
Large Magellanic Cloud. The progenitor of SN1987A, a type IIp, was a blue super- 
giant. These supernovae are about an order of magnitude less luminous at optical 
wavelengths than the more common type II with red super-giant progenitors. Type 
Ta supernovae are not expected to produce neutrinos detectable at astrophysical 
distances. 

Although neutrinos preceded the electromagnetic emission by three hours, 
SN1987A was discovered optically. Three neutrino detectors reported observations. 
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Kamiokande observed 11 electron anti-neutrinos over 13 s.°° Originally a proton 
decay experiment, Kamiokande had been upgraded in 1985 to lower its energy 
threshold for sensitivity to solar neutrinos, enabling it to see SN1987A’s neu- 
trinos ranging in energy from 7.5 to 36 MeV. IMB,°! another detector search- 
ing for proton decay, detected eight electron anti-neutrinos. The Baksan neutrino 
observatory,°” optimized to study atmospheric neutrinos, used the neutrino detec- 
tion times reported by Kamiokande and IMB to find an additional five neutrinos. In 
spite of these rather modest neutrino statistics, SN1987A has been used to constrain 
supernova models and study neutrino properties.** 

After the detection of neutrinos from SN1987A, every neutrino detector built 
since has sensitivity to supernovae in our galaxy and perhaps as far as the Magellanic 
clouds. The CCSN rate in our galaxy is ~2 per century, but the rate of CCSN that 
are observable optically at Earth is much lower. Larger neutrino detectors are needed 
to be sensitive to supernovae in M31, but such an enhancement would increase the 
detection rate only slightly as the CCSN rate there is about half that of the Milky 
Way. A new supernova observed with neutrinos would likely be detected in the 
galactic plane at a distance ~10 kpc. Combining larger detectors, better methods 
and a closer distance than SN1987A, such an event will probably result in a wealth 
of observations in neutrino energy spectra, flavor composition, light-curves, etc., 
that will allow detailed tests for models of CCSN and the various processes that 
might drive them. Furthermore, many existing detectors are configured to provide 
neutrino supernova alerts in real time. Although individual detectors are unable 
to provide relevant directional information, timing and triangulation by multiple 
detectors via the Supernova Early Warning System (SNEWs)** will provide a few 
hours of warning to optical astronomers. 


7.3. Star Formation History and the Diffuse Supernova 
Neutrino Background 


The history of all past CCSN is detectable with neutrinos via the diffuse supernova 
neutrino background (DSNB). As CCSN trace star formation rate, the measurement 
of the DSNB provides a measurement of the integrated history of star formation. 
Upper limits on the DSNB have already been incorporated into models for star 
formation evolution.®° 

Super-Kamiokande is currently the largest operating neutrino detector that is 
sensitive to the DSNB. Super-K has not yet observed any DSNB events, but its 
upper limit on the flux (~ 3.17. cm~?s~! above a neutrino energy of 17.3 MeV)°° 
is quite close to various supernova model predictions. The observation of the DSNB 
would allow the measurement of the average neutrino temperature in CCSN (at 
the scale of 5 MeV) and the average neutrino energy output (at the scale of 
10°° ergs). Super-Kamiokande focuses on the detection of electron anti-neutrinos 
via the reaction, 


Detp—ret+n. (4) 
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Current methods allow the detection of the positron. If its accompanying neutron 
can also be detected, it would provide an almost background-free detection channel 
for the DSNB. Presently, most of the neutrons are captured by hydrogen nuclei 
in the water and the resulting deuterium decays, emitting a 2.2-MeV gamma ray. 
This is below the energy detection threshold for Super-K, but the collaboration has 
nevertheless managed to develop neutron tagging with an efficiency of ~18%.°° Tests 
are underway®’ to add Gadolinium salt to the water to dramatically enhance the 
neutron tagging efficiency.°® The neutron capture cross section on Gd is significantly 
higher than that of water and its gamma-ray decay energy is 8 MeV, well above 
Super-K’s threshold. If Gd is indeed added, it may provide us with the first detection 
of the DSNB. 


7.4. Astrophysical Neutrinos Above 10 TeV 


Astrophysical neutrinos in the 10 TeV—10 PeV energy range have been observed by 
IceCube. Electromagnetic counterparts have not yet been found; no source class has 
been unequivocally associated with these neutrinos. Extragalactic origin is usually 
assumed, but has not yet been demonstrated. The quest to determine the origin of 
IceCube’s astrophysical neutrinos is likely to drive the field of neutrino astronomy 
for the foreseeable future. 

The origin of IceCube’s astrophysical neutrinos may be tied to the origin of 
cosmic rays. Discovered by Victor Hess in 1913,°° cosmic rays arrive approximately 
isotropically at Earth. At all but the highest energies, directional information for the 
cosmic rays is lost due to their propagation in galactic and extragalactic magnetic 
fields. Supernova remnants are prime candidates for the origin of cosmic rays, with 
only a few percent of the kinetic energy of the ejecta needed to explain cosmic-ray 
acceleration. However, supernova remnants are not expected to be able to accelerate 
particles beyond a few PeV (10!° eV). At an energy somewhat higher than this, 
cosmic rays are expected to be extragalactic; cosmic rays have been measured with 
energies as high as 107° eV by the Pierre Auger Observatory.°° Models of cosmic-ray 
acceleration predict a spectrum described by a power law dN/dE ~ E~?~°, with 
qa being a small number, and with diffusion in our galaxy softening the spectrum. 
The observed cosmic-ray spectrum is dN/dE ~ E~?". 

Cosmic rays interacting at or near their production point produce neutrinos 
via proton—proton or proton—photon interactions. These interactions result in the 
production of neutral and charged pions, which decay into detectable photons and 
neutrinos, respectively. Furthermore, sources with high opacity can be bright in 
neutrinos but dim in cosmic rays and/or gamma rays. Assuming that the opacity 
of extragalactic sources is 1.0, and normalizing to the ultra high-energy cosmic-ray 
spectrum above 10!” eV, one can set an upper limit on the total neutrino flux 
expected from extragalactic (cosmic ray) sources. This bound was calculated by 
Waxman and Bahcall®! to be E?dN/E = 107-8 GeV cm~?s~'sr~! per neutrino 
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flavor, and it is intriguing that the currently observed flux is at a level similar to 
this bound. 

A flux at this level requires a cubic-kilometer-scale detector for detection of 
high-energy neutrinos associated with cosmic-ray sources. Early efforts included 
DUMAND,® Baikal!” and AMANDA." Sited at the South Pole, AMANDA demon- 
strated the feasibility of the water Cherenkov technique in ice, and led more or less 
directly to the development and deployment of IceCube as the first kilometer-scale 
detector. 

The successful detection of astrophysical neutrinos by IceCube has been per- 
formed with several methods. Here we describe the “High Energy Starting Event” 
(HESE) discovery method? that required neutrino interactions to happen inside 
of the detector in order to eliminate the large background of cosmic-ray-induced 
muons. At high energies, above roughly 60 TeV, there is a significant excess of events 
with respect to what is expected from atmospheric neutrinos alone. The HESE data 
set is dominated by neutrino-induced cascades (showers), and the highest energy 
observed in this channel is 2.1 PeV. A separate analysis that uses through-going 
muons has observed an event with 2.5 PeV deposited energy, corresponding to a 
likely neutrino energy of 8 PeV Fig. 9. 

All detection methods used to date are consistent with an isotropic distribution 
of the neutrinos. The detections are also consistent with expectations for neutrino 
oscillations from a distant astrophysical source.®* The spectrum, using a combina- 
tion of studies by IceCube, is given by 


dN me \ehe 
er =2.2 1078 (aH) GeV cm? st srt. (5) 


MM Power law (v, +, +V,) 
arg Differential (v, +u, +v,) 


10° 10° 107 


E, [GeV] 
Fig. 9. Spectral data for the all-flavor astrophysical neutrino flux from IceCube using a combined 


data set of tracks and cascade events.®? The fit band corresponds to a power law. Individual data 
points correspond to deconvolved model. 
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There may be tension between the v, channel and all the other channels. The 
spectral index for the former is —2.1, but it is softer for all other methods. This may 
be because HESE and other methods can reach lower energies than those possible 
with v,. This may indicate, e.g. two populations of sources and a hardening of the 
spectrum at ~200 TeV. 

IceCube has searched for correlations between its detected neutrinos and mul- 
tiple source classes. Gamma-ray bursts reported by satellites have been shown to 
contribute no more than ~1% of the observed flux in the prompt phase.®° Relaxing 
the time correlation to +20 hours degrades the limit to 12%. IceCube also con- 
strains a correlation with blazars found by Fermi LAT to contribute less than 17% 
of the HESE flux, nearby starburst galaxies to 8%, young supernovae remnants (not 
interacting with a molecular cloud) to 5%, young pulsar wind nebula to 3%, and 
the galactic plane, including diffuse emission and any type of source, to 14%. There 
is no evidence of directional clustering in astrophysical neutrino data with IceCube. 
This implies that the sources are individually weak, no more than 1% of the total 
flux observed.®’ Limits on the flux of individual point sources are shown in Fig. 10. 
Ultimately, the identification of IceCube’s neutrino sources will most likely require 
the identification of an electromagnetic counterpart. 


— Pre-trial (Disc. Potential) »  2FHL objects 
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- — ANTARES (Sensitivity) * Hot spots 
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Fig. 10. Limits on the flux by individual point sources set by IceCube using seven years of data.®® 
(Figure courtesy S. Coenders.) 
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8. Conclusion 


As neutrino detectors have grown in size and sophistication, they have extended 
their reach from earth-bound sources of neutrinos, to solar and galactic sources, 
and finally to neutrinos from sources at cosmological distances. At each step, cru- 
cial discoveries increased our understanding of astrophysical sources and expanded 
our knowledge of the properties of neutrinos themselves. Future astrophysical neu- 
68-79 aim to garner a sample of high-energy neutrinos large enough 
to finally discover one or more sources, ushering in a new era in astronomy and 
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multimessenger astrophysics, and potentially unmasking the mysterious sources of 
the highest energy cosmic rays. 
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Gravitational waves are in many ways the most accurate witnesses of the most 
energetic events in the universe, as they are generated by the bulk motion of the 
involved masses themselves. Laser interferometers are now capable of detecting 
these waves directly and will allow us to explore for the first time the gravitational 
wave sky. This chapter describes the current generation of ground-based observa- 
tories such as Advanced LIGO and Advanced Virgo. It also discusses the future 
network of ground-based observatories as well as the planned space-observatory. 


1. An introduction to Gravitational Waves 


When Einstein presented his theory of general relativity, he might have already 
realized that his theory allowed for wave solutions that solved one of the oldest 
fundamental problems of Newton’s theory of gravity: “action at a distance”, or 
how changes in the gravitational field propagate through space and time. Only a 
few months after his famous presentation at the Prussian Academy of Science, he 
published his first paper? on gravitational waves. Historically interesting is the fact 
that this paper included a critical error which Einstein corrected in a second paper 
published two years later.?:4 

According to Einstein, sources for gravitational waves are accelerated masses 
with a time-varying moment of inertia (see Ref. 5 for an excellent review): 


1y(t) = | ott, B)asejde. (1) 
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These cause distortions in spacetime at a distance r >> AGw (far field) from the 


source of: 
em 
hij = a lislt = r/c) = hyi- (2) 


Defining the z-axis as the propagation direction, this can be described as a transverse 
traceless tensor: 


0 O 0 O 
{0 hy hy 0 ; = - 
h= D he. She 0 with hy = hex, hy = hay. (3) 
0 O 0 O 


The measurable effect of a gravitational wave is a change of the distance Lo between 
free-falling objects, here placed along the x-axis: 


Les Lo<AGw 1 
V1l+hidx rw Lo + gh+Lo- (4) 


An equal but opposite length change appears in the y direction, while the diagonals 
are squeezed and stretched proportional to hy. The left part of Fig. 1 shows the 
change in a ring of free-falling test masses as a function of time for the two different 
polarizations h, and hy of a monochromatic gravitational wave. It is these length 
changes that laser interferometric gravitational wave detectors, such as LIGO, Virgo 
and KAGRA on the ground and LISA in space, detect. 


0) 


saree we — 


t=0 t=T/4 t=T/2 t=3T/4 


Observer 


Fig. 1. Left: The effect of gravitational waves can be described as a squeezing and stretching of 
distances in spacetime. This is often depicted as the deformation of a circle into an ellipse, back into 
a circle, and again into an ellipse (rotated by 90 degrees) and then back into the original circle. This 
squeezing and stretching can happen in any direction but can be described as a linear combination 
of stretching and squeezing along the coordinate axes and diagonal to them. The first case defines 
the + polarization and the second case the x or cross polarization. Right: A non-relativistic, 
non-spinning binary system in a circular orbit allows one to calculate the gravitational wave strain 
in the far field. For simplicity, we place the two stars into the x—-y plane with their center of 
mass at the origin of our coordinate system. The largest amplitudes in both polarizations are then 
detectable along the z-axis, while the emission along the x and y directions only includes one 
polarization. 
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While most accelerated masses will have some non-vanishing and time depen- 
dent mass quadrupole moment, the pre-factor G/c* in Eq. (2) ensures that only 
the largest changes in mass quadrupole moments will be measurable. The strongest 
signals are generated by compact binary systems, such as two black holes or neutron 
stars, just before or during their merger. The first-ever directly detected gravita- 
tional waves came from the GW150914 merger of a 36 solar mass black hole with a 
29 solar mass black hole, which happened about 1.3 billion years ago. More detec- 
tions were made during Advanced LIGO’s first science run: GW151226 was a merger 
between a 15 solar mass and an 8 solar mass black hole that also happened 1.3 billion 
years ago.” LVT151012 appears to have been generated by a merger between a 23 
and a 13 solar mass black hole around 3.5 billion years ago,’ although the signal has 
a 2% probability of being instrumental and not cosmological. Figure 2 shows these 
three signals in the frequency domain (left) and the time domain (right). The left 
graph also shows typical sensitivities of the two Advanced LIGO detectors during 
their first observation run 01. The detectors themselves are described in detail in 
Ref. 9. It is expected that Advanced LIGO will operate in different modes with 
different sensitivities. The black line represents an approximation of the expected 
sensitivities. 

The detailed calculation of the emitted waveform from a binary system during 
the merger phase requires relativistic numerical simulations and is still an ongoing 
area of research.!°:'! However, in the non-relativistic, non-spinning, circular-orbit 
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Fig. 2. Left: Advanced LIGO noise curve expressed as a linear spectral noise density and the 
three gravitational wave signals detected during O1.8 The three black line segments indicate the 
three main limiting noise sources of the final advanced LIGO observatory. The yellow-shaded 
area shows the frequency band where signals from supernova explosions are expected to show 
up. The individual lines in the spectra are from mechanical resonances in the detector, a few 
calibration lines, and a few other nearly monochromatic noise sources. The signal amplitudes have 
been scaled by 1/./f to visualize the longer integration times at lower frequencies in the noise 
spectrum. Right: The graphs show the best fitting simulated signals of the three detected events 
in the LIGO observation band. See electronic edition for a color version of this figure. 
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limit, we can calculate the waveform from Eq. (1). If we select a coordinate system 
in which the two (point) masses orbit in the xy plane around their center of mass 
(see right part of Fig. 1): 


r1 cos Ot —r2 cos Ot 
r=] risinQt |, ro = | —resinQt |, (5) 
0 0 
the density will be 
p(t, 7) = M1o(r — 71) + M2d(r — 7) (6) 


and the mass quadrupole moment has the following non-vanishing components: 


1 
it) = 5 (Mir{ + Mor3) sin(2t) = Iy(t), (7) 
1 
elt) = / p(t, P)a*dV = 5 (Mir{ + Mor3) (1 + cos 2), (8) 
Ss 
1 
ight = / p(t, F)y*dV = 5 (Mir{ + Mor3) (1 — cos 2). (9) 
Ss 
Assuming non-relativistic Keplerian orbits allows the calculation of the orbital fre- 
quency: 
G (M, + M2) (Ri + Re) 
Q = CSC ne 
D3 c D3 (10) 


Equation (10) can be used to express the second time derivative as follows: 


i 2GM,M. i 

Tyy(t) = —— sin 20t = Iy2(t), (11) 
2GM,M. 

Tyy(t) = ——— cos 20t = Iyy(t). 


This can be inserted into Eq. (2) and after expressing the individual masses by their 
Schwarzschild radii, Ry(2) = 2GMj,2)/ c?, we arrive at a very simple expression for 
the emitted gravitational waves propagating perpendicular to the orbital plane: 


Ry Ro = Ry Re 
r 


hy = [psn 2H, hs cos 20t. (12) 


In general, the orbital plane will not be perpendicular to the line of sight and the 
amplitudes are modified by geometric factors.° For the simple case where the line 
of sight is rotated by an angle O within the xz plane, the two amplitudes change 
by cos © and 0.5 (1 + cos? ©), respectively. For example, for an observer placed on 
the x-axis (cos © = 0), the generated gravitational waves will only be polarized in 
the + polarization (parallel to z- and y-axes) and not in the x polarization. Having 
multiple observatories to resolve the polarization of the received gravitational wave 
helps to disentangle the orientation of the orbital plane from other parameters 
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such as mass and distance. Additional geometrical factors are required to include 
the orientation of the observatory relative to the sources’ location and polarization 
directions of the received waves. !” 


The energy loss or luminosity of a binary source is” 
GE _ G vigy _ 32G" mims (mm + m2) a3) 
dt 5c I 568 D> ) 


which leads to a shrinkage of the orbit: 


D _ 64 G my 1m, (m4 + mg) 
~ 5 8 D8 

and, according to Eq. (12) and Kepler’s law, an increase of the gravitational wave 

amplitude as well as the orbital frequency: 


: i: G@e 5 1 (Ri +R yi3 
= /3Q11/3 _ = | MYL Te) © 4 
Q 10 2 mel), 10 mA 7 F (14) 


The frequency, as well as the changes in amplitude and frequency, allows us to 
determine R,, Rp and D(t) from the data. The amplitudes of the gravitational 
waves at the different detector locations, together with the differences in the arrival 
times, allow the determination of the orientation of the orbiting plane with respect to 
the detectors. However, it is important to keep in mind that we used non-relativistic 
Keplerian approximations throughout this discussion and incorporated the energy 
loss due to gravitational wave emission by hand. While the results will be valid in 
the non-relativistic part of the inspiral phase, relativistic corrections are necessary 
during the later phases when D is no longer much larger than R, and Rg and when 
the relative velocities of the black holes or neutron stars approach c. Calculations 
during these phases require post-Newtonian corrections and finally fully relativistic 
numerical simulations.!9: |! 

Following the merger, the newly formed black hole will continue to ring like 
a bell for a short period of time before it settles into a standard Kerr black hole. 
The frequency of the ring-down is directly determined by the quasi-normal modes 
and therefore the size of the final black hole.'* Note that it has been pointed out 
that the low signal-to-noise ratios of the first detected gravitational waves allow also 
other more exotic explanations,!4 which might be ruled out by future detections. 

Other expected sources of detectable gravitational waves in the band of ground- 
based observatories (above a few Hz) include rotating neutron stars. A rotating 
spherically symmetric star will not emit gravitational waves, as its mass quadrupole 
moment is zero. However, if the neutron star can sustain a substantial asymmetry 
in the form of a non-vanishing mass quadrupole moment, it would be to first order 
equivalent to a small binary system formed by two effective masses orbiting around 
each other with a period identical to the spin of the neutron star itself. The best 
chances for the detection of a wave from such a neutron star are probably provided 
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by known pulsars, where the gravitational wave frequency is well known. This cor- 
relation allows us to integrate the signal for a very long time and average out any 
random noise at that signal frequency. It should be noted that the gravitational radi- 
ation might reduce the mass quadrupole moment over time, and internal processes 
in the neutron star might also change J occasionally. These effects lead to phase and 
amplitude changes in the gravitational wave signal that could significantly reduce 
our chances for a detection from these sources. Initial LIGO and Virgo both reached 
displacement sensitivities of <10~??m at more than one hundred pulsar frequen- 
cies even before the advanced detectors came online.'>:'© The missing signals limit 
the ellipticity of the nearest pulsars like the Crab and the Vega pulsars to be well 
below 10~°. This limit will continue to improve with improved sensitivity and longer 
integration time. 

The third group of often discussed sources for ground-based observatories are 
supernovae. Once a star runs out of fuel, it can no longer provide the pressure 
to protect itself against gravitational collapse. During the collapse, the inner part 
bounces back and the resulting shock waves blow the outer layers into space. The 
electromagnetic luminosity during this process often outshines the luminosity of the 
host galaxy, and some of the energy will be released in the form of gravitational 
waves.!7!8 However, it is well known that the gravitational field outside a sphere 
only depends on the mass and not on its radius. As a consequence, a spherically 
symmetric explosion would not cause any gravitational waves. LIGO and Virgo- 
type detectors depend on a fair amount of asymmetry within the explosion to have 
a chance for a detection. The current expectation is that a supernovae within a few 
Mpc or within our local group of galaxies might be detectable, while the prospects 
for detecting gravitational waves from supernovae further away with current detec- 
tors are not very promising. 

Gravitational wave astronomy just passed the threshold from being conceived 
to being born. The expectation is that ground-based detectors like Advanced LIGO 
and Advanced Virgo will soon measure gravitational waves from binary black hole 
mergers on at least a weekly basis. At one point in the near future, a signal from 
a pair of merging neutron stars will be detected, which will allow us to study the 
equation of state inside a neutron star.* We will start to see neutron star mergers 
at some rate and will probably also see a signal from the next local supernova, 
although their rates are very low and the amplitudes of the generated waves are 
too uncertain to make any reliable predictions. It is unclear when we will detect 
gravitational waves from pulsars. However, this new observational technique opened 
a new window to the universe that will continue to surprise us, in the same way 
that the first gravitational wave signals surprised us. In the following section, I pro- 
vide an overview of the design of ground-based detectors such as Advanced LIGO 


4Since writing this chapter, Advanced LIGO and Advanced Virgo concluded their 02 run, during 
which they reported seven additional black hole mergers and one neutron star merger.®! Starting 
in April 2019, observation run O3 approached weekly detections. 
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and Advanced Virgo. Section 3 focuses on the emerging network of ground-based 
detectors, which will improve the angular resolution to a level at which follow-up 
observations with electro-magnetic telescopes become very promising.” This section 
also introduces concepts beyond the current generation, such as the Einstein Tele- 
scope and Lungo.!®:?° The fourth section is dedicated to the Laser Interferometer 
Space Antenna (LISA), a planned space-based detector that will open another very 


signal-rich window in the gravitational wave spectrum.?! 


2. Advanced Ground-Based Observatories 


The history of using laser interferometry to detect gravitational waves dates back 
into the early 1960s. Following the 1959 publications about plane wave solutions 
in general relativity by Bondi and Pirani?? and the invention of the laser in 1960 
by Maiman,?? Gertsenstein and Pustovoit?4 were the first to publish the initial 
idea to use a laser interferometer to detect gravitational waves. Weiss?° was the 
first to complete an analyses of the potential of laser interferometry, taking into 
account already most of the noise sources that limit or could potentially limit the 
observatories. Groups in the US, Germany and the UK started building the first pro- 
totypes, which evolved over the years. These prototypes confirmed many of Weiss’ 
noise estimations, but also demonstrated how difficult this endeavor would be. In 
the mid-1970s, Kip Thorne became involved and convinced Ron Drever to move 
from Glasgow to Pasadena to start a group at CalTech. Later, this group joined 
forces with Weiss’ MIT group to submit the LIGO proposal in the late 1980s.?° 
Parallel proposals were submitted by a German-—British collaboration, which led to 
GEO600,?" while a French-Italian collaboration successfully applied for funding for 
the Virgo observatory near Pisa.?° However, the first larger detector online was the 
TAMA detector in Tokyo,?® which laid the foundation for the KAGRA observatory 
that is currently under construction in Japan.°° 

Gravitational waves modulate the optical path between objects: 


p= Bye hbo) erent = Enyeiwt-FLo) (1 4 in) . (15) 


This time-dependent modulation can be described as the generation of sidebands 
that propagate with the original laser field through space: 


E= Bye) 


1+ wf ul) (ei _ eo) a] , (16) 


where 61 is the linear spectral density of the induced length change. In analogy 
to radio technology, the original field is called the carrier while the new frequency 


>This has also been surpassed by reality: A neutron star merger was seen by Advanced LIGO and 
Advanced Virgo in coincidence with a gamma ray burst seen by Fermi. Too many telescopes to 
list here followed up with EM observations.92 
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components are the signal sidebands and encode the signal in the frequency domain. 
The goal of all laser interferometers is to detect these signal sidebands. One of the 
challenges is that a laser beam itself is far from being a perfect monochromatic 
field and has phase and amplitude fluctuations. These noise fluctuations can also 
be described by sidebands with a complex amplitude e+ at frequencies +0: 


Ex, = Ege'(#t—*Lo) c +f (e,(Q)et* + €_ (Ne) dQ] , (17) 


which will typically mask the very weak signals. 
Fortunately, a Michelson interferometer of equal arm length operated on the 
dark fringe, 


Epp = rtEin (en _ eee) ae OP (18) 


has the advantage that the entire laser noise is common in both arms, destruc- 
tively interferes in the dark port and is sent back towards the laser. In contrast 
to this, optimally aligned gravitational waves modulate the arm length in opposite 
directions: 

bli (Q) = —dl2(), (19) 
and these sidebands interfere constructively and leave the interferometer through 
the dark port. This ability of Michelson interferometers to suppress, at least in first 
order, all input noise and all noise common in both arms while maximizing the 
signal makes ground-based observatories feasible. 


2.1. Optical Layout 


The optical layouts of all advanced interferometers are fairly similar and we will 
restrict the discussion here to the Advanced LIGO interferometer, which is shown 
in Fig. 3. It is significantly more complex than a simple Michelson interferometer. 
Each interferometer arm hosts a 4 km-long arm cavity formed between one end test 
mass (ETM) and one input test mass (ITM). The field inside an optical resonator 
(a.k.a. Fabry-Perot) at the input mirror propagating towards the end mirror Eay 
is a superposition of the field already inside the cavity and the newly injected field 


Ejx (see inset in Fig. 3 for locations of the fields): 
Eeay = itrrm Ein + TITMTETM€'?*" Ecay - (20) 


The reflected field is a linear combination of the directly reflected field and the field 
leaking out of the cavity: 


Even = rrr Ein + ttrrmrerme’?®" Ecav- (21) 


Here rrpm, Term and tyrm are the amplitude reflectivities and transmissivities of 
the test mass mirrors, and 7 is the imaginary number; it is customary to use the 
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Fig. 3. Advanced LIGO design: The pre-stabilized laser (PSL) on the left generates up to 200 W 
of single frequency laser light. The field is phase modulated with an electro-optical modulator 
(EOM) before it is injected into the input mode cleaner (IMC), a triangular 33 m-long (round 
trip) optical cavity. The IMC filters the spatial mode of the laser field and acts as an intermediate 
frequency reference for the laser frequency stabilization system. The light is then sent through 
the Faraday isolator (FI), after which up to 125 W of the most pristine laser field is injected into 
the folded power recycling cavity through the power recycling mirror (PRM). The beam splitter 
(BS) splits the light before it is sent into the two 4 km-long arm cavities formed between the 
input test masses (ITM) and end test masses (ETM). The signal recycling cavity with the signal 
recycling mirror (SRM) is the last piece within the main interferometer. PR2 and PR3, together 
with SR2 and SR3, form beam expanding or compressing telescopes to increase the beam radius 
from 2.5 mm to 5.3 cm. The signal field passes through a second FI before the output mode cleaner 
(OMC) filters the spatial mode of the output field prior to the main photo detector (PD). The 
power levels indicate the expected power levels in the power recycling cavity and inside the arm 
cavities during full power operation. The designs of all advanced ground-based detectors include 
most if not all of these elements, but their specific designs might differ significantly. The inset 
shows the locations of the input, cavity-internal, and reflected fields used in Eq. (20) and beyond. 
See electronic edition for a color version of this figure. 


Siegman convention and add a 90° phase shift to the transmission and no phase 
change to the reflection coefficients.?! The round trip phase change inside the cavity, 
ort, is given by 


oar = N2zn + ddr, (22) 


where d¢pr is its deviation from resonance. The length of each arm cavity will 
be kept at or near resonance (d¢pr <7). If we further assume lossless mirrors 
(r2.0M + thay = 1), it is simple algebra to calculate the phase shift in the reflected 
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field: 


Trrm=1.4% 
Oder “=> 285-dérr. (23) 


4 
ddcav C= Tiny 
In the last equation we used the power transmissivity of the Advanced LIGO input 
test masses, 7ipm = Lea = 1.4%. One interpretation of this result is that the 
arm cavities increase the effective arm length of the Michelson interferometer by a 
factor of 285. A second equally valid interpretation is that the arm cavities build 
up the amplitude of the carrier by a factor of 2/trrm. This is the source field that 
is modulated by the gravitational waves to create the signal sidebands according 
to Eq. (15). These signal sidebands are then built up coherently inside the arm 
cavities, which leads to an additional amplification of 2/trry as long as the signal 
frequency is within the bandwidth of the arm cavities. The disadvantage of the arm 
cavities is that they act as first-order low pass filters for the signal sidebands with 
a corner frequency around 300 Hz. 

The substrates and coatings for the test masses were chosen to minimize the 
optical losses as much as possible and, once the Michelson interferometer is tuned 
to a dark fringe, well over 97% of the injected laser power returns towards the 
laser. Seen from the laser, the power recycling mirror together with the Michelson 
interferometer forms the power recycling cavity, which increases the laser power at 
the beam splitter and also inside each arm cavity by another factor of = 40, further 
enhancing the carrier field.?? As discussed earlier, the generated signal sidebands 
leave the interferometer through the dark port and are not affected by the power 
recycling except for their larger amplitude. 

The initial detectors just measured these sidebands, while the advanced detec- 
tors placed the signal recycling mirror between the final detector and the interfer- 
ometer.° This mirror, together with the Michelson interferometer, forms the signal 
recycling (SR) cavity, which modifies the frequency-dependent optical gain for the 
signal sidebands. This SR cavity allows us to either increase the amplitude of the 
signal sidebands at a specific frequency at the expense of the bandwidth, or to reduce 
the effective reflectivity of the ITMs for the signal sidebands, which increases the 
bandwidth of the interferometer at the expense of peak sensitivity.°* 34 

This complex interferometer design comes with a series of additional challenges 
which are far less pronounced in a simple Michelson interferometer. One of them 
is related to the transversal beam distribution of the laser field within the interfer- 
ometer. In the paraxial wave approximation, the transversal beam distribution of a 
laser beam can be described as a linear combination of a set of Hermite—Gaussian 
modes. The ideal laser beam can be described as a simple Gaussian distribution. 
Although the 200 W laser system is probably the most stable high power laser in the 
world,®° it does not emit a perfect Gaussian mode and the electro-optic modulators 


©The GEO600 detector started with signal recycling but without arm cavities 
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(EOM), as well as all other components in the beam path, can further reduce the 
beam quality, especially when they are exposed to the high power laser field. This 
creates the need for the input mode cleaner (IMC), a 33 m-long triangular optical 
cavity that filters the spatial mode of the laser beam such that only a well-defined 
Gaussian distribution is transmitted through the IMC. This distribution then passes 
through a Faraday isolator before it is mode matched to the spatial eigenmode of 
the interferometer.®© 

The complex interferometers in the advanced observatories consist of several 
coupled optical cavities, which all have their own sets of spatial eigenmodes. These 
cavities have to be mode matched to each other to maximize the carrier in the arm 
cavities and the signal in the dark port. Additional requirements on the design are 
to maximize the beam size on the test masses to minimize coating thermal noise 
(see next section), at the same time to minimize diffraction losses, to reduce the 
sensitivity of the entire interferometer to misalignments and figure errors in the 
mirrors, and to allow the generation of sufficiently strong alignment sensing signals. 
These requirements push the design in opposite directions and the final design is a 
trade-off between all of these requirements.?” 

Although the interferometer at large is designed to minimize asymmetries in the 
arms, some asymmetries are unavoidable. For example, only one of the laser beams 
will pass twice through the substrate of the beam splitter. In addition, the test 
masses will also not be perfectly identical. Other asymmetries, like a small difference 
in the distances between the ITMs and the beam splitter, have been included to 
generate signals which are required to control the position of each mirror and beam 
splitter within the interferometer.** °° All these asymmetries create light which is 
either not at the frequency or not in the spatial mode of the signal sidebands. The 
output mode cleaner (OMC) is an optical filter which rejects this additional light 
and lets only light in the correct frequency range and spatial mode pass.*° 

A second challenge is the control of the relative positions and alignments of 
all interferometer mirrors with respect to each other.*!:4? The length sensing and 
control system needs to sense the distances between all mirrors to keep the laser field 
resonant in the arm cavities and in the power recycling cavity. It also has to keep 
the Michelson interferometer at its working point near the dark fringe, and keep 
the signal recycling cavity at a specific tuning. The requirements on the absolute 
stability of the interferometer mirrors depend on the specific mirror but range typi- 
cally from pm for the recycling mirrors to fm for the test masses. Misaligned mirrors 
increase the susceptibility of the interferometer to beam jitter or fluctuations in the 
direction of the injected laser beam. Requirements for the alignment also depend 
on the specific mirror and range from a few prad for the recycling cavity mirrors to 
nrad for the test masses.*% 

The length and alignment sensing systems use a phase modulation/demodu- 
lation technique for most degrees of freedom. The modulation creates additional 
fields with different frequencies (RF-sidebands) that are injected, together with the 
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carrier, into the interferometer. These fields create beat signals with the carrier and 
other RF-sidebands on single-element and quadrant photodetectors that are placed 
at various locations around the interferometer.*+ The photo currents are demod- 
ulated with the modulation frequencies the same way a lock-in amp demodulates 
a modulated signal. Typically, these demodulated signals depend simultaneously 
on several degrees of freedom, and the challenge is to design a sensing scheme 
that allows the formation of linear combinations between the signals that isolate 
each degree of freedom well enough to actively control all of them. This requires 
that the phase modulation frequencies between a few and several ten MHz are well 
matched to the lengths of the recycling cavities and the distances between the beam 
splitter and the input test masses. A detailed description of the Advanced LIGO 
interferometer is available at Ref. 45. 


2.2. Creating Free-Falling Test Masses 


Gravitational waves change distances between free-falling test masses and, obvi- 
ously, measuring these changes requires free-falling test masses. Truly free-falling 
test masses do not exist but can be approximated on Earth as free above some 
frequency if they are suspended by seismic isolation and suspension systems. On 
Earth, the ground is always moving with varying amplitudes ranging from a few 
pum at frequencies below one Hertz to sub-nm at 10 Hz, even at locations as isolated 
as the LIGO sites. It is the task of the seismic isolation and suspension system to 
isolate the test masses from these motions, to ensure that the mirrors can be kept 
within a few femtometers of their optimum positions, and to see that their motion 
above 10 Hz meets the scientific requirements for the displacement noise of about 
2 
ea cece 
f?  JHz 
Advanced LIGO uses multi-stage seismic isolation systems. These systems rely on 
accelerometers and position sensors to control actively the position of each stage. In 
addition, the different stages are separated by springs which passively suppress the 
motion of the upper stage above the resonance frequencies of the springs.*° The rms 
motion averaged over one second of the last stage of the Advanced LIGO’s seismic 
isolation system is less than 10 pm or 1/10th of the diameter of a hydrogen atom. 
The remaining six to eight orders of magnitude of suppression is provided by 


(24) 


four pendula in series, each providing a mechanical second-order low-pass filter 
above the resonance frequency of the previous pendulum. The left part of Fig. 4 
shows the Advanced LIGO suspension system. The uppermost pendulum is sup- 
ported from the last stage of the seismic isolation system by steel wires that are 
attached to cantilever springs for additional vertical isolation. The second stage in 
the quadruple pendulum is also attached to another set of cantilever springs within 
the first stage using more steel wires. The penultimate mass, a 40 kg fused silica 
cylinder, is hanging in a steel wire that is attached to a third set of cantilever springs 
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Fig. 4. The Advanced LIGO*” (left) and Virgo*® (right) suspension systems are both multi-stage 
pendulums with additional vertical springs. 


in the second stage. Finally, the test mass itself is welded to fused silica fibers which 
are welded at the other end to the penultimate mass. This last stage is a monolithic 
suspension system with very low mechanical losses to reduce the thermal suspension 
noise (see also next section). A second, nearly identical, suspension chain is located 
behind this chain. Electrostatic and magnetic forces between the main and this 
reaction chain can be applied to control the position and orientation of the test 
mass.*” 

The Virgo collaboration developed their super-attenuator already for initial 
Virgo and will continue to use it for Advanced Virgo. The super-attenuator is shown 
in the right part of Fig. 4. It starts with an inverted pendulum comprised of three 
6 m-long metal legs that rest at one end in metal tubes. The other end supports the 
top mass of the suspension chain. The top mass and five additional seismic filters 
use steel wires and cantilever springs to form a chain of mechanical filters. The last 
stage, called the marionette, is suspended from the last seismic filter with another 
set of steel wires. The final mirror in initial Virgo was suspended with steel wires 
from the marionette. These steel wires have been replaced by fused silica fibers 
for Advanced Virgo to reduce suspension thermal noise. The marionette includes 


magnetic actuators to align the suspended mirror.*® 
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Both suspension systems are marvels of mechanical engineering and appear to 
be meeting their requirements, and should allow detection of gravitational waves at 
frequencies as low as 10 Hz. However, this will only be known when all other low 
frequency noise sources have been minimized and the detectors meet their require- 
ments. Below 10 Hz, many other noise sources will appear. The most important 
of these is Newtonian noise, or the noise caused by fluctuations in the local grav- 
itational field due to many different masses moving around.*? For the foreseeable 
future, Newtonian noise is expected to limit the current generation of ground-based 
detectors to measurements above a few Hz. 


2.3. Limiting Noise Sources 


The fundamental noise sources that limit ground-based detectors above about 10 Hz 
are quantum noise at low and high frequencies and coating thermal noise at inter- 
mediate frequencies. The quantum noise is related to the statistical fluctuations of 
even the most perfectly coherent field. These fluctuations cause differential power 
fluctuations in each arm which modulate the radiation pressure on each of the cavity 
mirrors. These mirrors respond with a displacement that depends on the mass m 
of the mirror and the power P in the laser beam: 


1 26P  2hv/N 


mQ2 ¢ me QQ?’ 


dlppn = (25) 
where N is the number of photons which reflect off the mirror each second.? This 
quantum radiation pressure noise is different from the technical radiation pressure 
noise, which is generated by classical laser power fluctuations. The technical fluc- 
tuations are common in the two arms of the interferometer for as long as the beam 
splitter splits the beam 50/50 and for as long as the cavity mirrors are identical in 
mass and reflectivity. 

The quantum noise also changes the phases of the fields in the two arms, and 
these changes are similar to phase changes induced by gravitational waves. This 
noise also depends on the number of received photons and the laser wavelength: 


-  6¢, A /I 


This noise is commonly known as the shot noise limit. Quantum radiation pressure 
noise and quantum phase noise (shot noise) are two quadratures of the same quan- 
tum noise caused by vacuum fluctuations that enter the interferometer through the 
dark port .°° One improves when the laser power increases, one when it decreases. 


The time interval of one second implies that the linear spectral noise density is measured in 
m/VHz. It is custom to call N the rate and VN the uncertainty in this rate but strictly speaking 
this is not correct. 
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The optimum for a given mass is called the standard quantum limit, although it can 
be surpassed by correlating the vacuum fluctuations through a non-linear process. 
This non-linear process creates a non-classical form of light called squeezed light or 
squeezed vacuum, which decreases the noise in one quadrature at the expense of the 
other.°! Injecting squeezed vacuum is one of the planned upgrades for Advanced 
LIGO. 

It is truly remarkable that gravitational wave observatories measure length 
changes that are eight to ten orders of magnitude smaller than the diameter of an 
atom while the atoms which form the test masses move somewhat randomly around 
their equilibrium positions due to their finite temperature. This seeming dilemma 
is best explained by looking at the eigenmodes or internal degrees of freedom of 
each mechanical system. In the linear regime, each internal degree of freedom can 
be described as a simple harmonic oscillator, and any mechanically excited motion 
will cause a restoring force inside the material. If the excited degree of freedom 
couples to any of the other degrees of freedom, this restoring force will be slightly 
out of phase with the excitation and some of the energy in the harmonic motion 
will dissipate into other degrees of freedom. The same process also works in the 
other direction, and heat from the thermal bath couples through these dissipative 
mechanisms to the first degree of freedom. Equilibrium is reached when all degrees 
of freedom share the same average energy of kgT/2. The fluctuation dissipation 
theorem puts these thoughts into a mathematical form and allows the calculation 
of the power spectrum of the thermal noise in each degree of freedom®?: 


_ 4kpT (2) 
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where m is the effective mass and Qo the (angular) resonance frequency of this degree 
of freedom. ¢(Q) is the frequency-dependent phase shift between restoring force 
and excitation. This phase is also known as the loss angle and is the inverse of the 
mechanical Q factor of that degree of freedom. Materials with very low mechanical 
losses concentrate the thermal noise into a very narrow frequency band around 
their resonance frequency and have low thermal noise everywhere else, including 
the LIGO measurement band. 

Thermal noise exists in many parts of a mechanical system as complex as LIGO. 
However, the critical parts are the fiber suspensions, the mirror substrates and 
their dielectric coatings. The fiber suspensions and mirror substrates are all made 
from fused silica and are welded together to reduce friction at the interfaces below 
the critical level. The limiting thermal noise source in the mid-frequency range of 
ground-based detectors like Advanced LIGO and Advanced Virgo is coating ther- 
mal noise.°? Each end test mass is coated with more than 30 alternating layers of 
titania-doped tantala and fused silica to optimize the reflectivity of each mirror. 
The mechanical properties of these ion-beam coatings are fairly well measured and 
are in good agreement with the measured coating noise. Still, the community has 
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not yet identified dielectric coatings with low enough mechanical and optical losses 
such that coating thermal noise would finally become irrelevant. Both Advanced 
LIGO and Virgo reduce the coating thermal noise by increasing the beam size on 
the mirror surfaces as much as possible to average better over the random motion 
of the atoms which form the coatings. However, this approach is limited by the size 
of the mirrors and the beamsplitter, as well as by requirements on the geometric 
stability of the arm cavities®° 


2.4. Data Analysis 


Ground-based gravitational wave observatories produce a constant stream of strain 
data, h(t), which is analyzed in real time using several different data analysis tech- 
niques and tools. The first gravitational wave GW150914 was discovered by Coher- 
ent WaveBurst, a software tool that uses wavelets to calculate the spectral power in 
specific frequency bands as a function of time.°° Correlated changes in the spectral 
power distribution in the two LIGO detector signals above some threshold create a 
candidate event, which is then analyzed off-line in more detail. Figure 5 shows the 
time-frequency plot for the two Advanced LIGO detectors for G150914.° Coherent 
WaveBurst makes very few assumptions about the form of the signal and its main 
purpose is to search for unmodeled or badly modeled sources such as gravitational 
waves from supernovae, but it is also sensitive to signals from mergers. 

Another type of online search algorithm, which focuses on mergers, uses 
matched filters in the frequency domain.°” This highly efficient algorithm parti- 
tions the strain data in overlapping blocks, calculates the Fourier transform of each 
block, and compares it to templates of model waveforms. These waveforms form 
a grid in the source parameter space that is dense enough to detect all mergers 
which emit with high enough amplitude in the frequency band of the advanced 
detectors. Coincident triggers of similar signals within a 10 ms time window cre- 
ate an event; 10 ms is the maximum difference of the time of arrival between the 
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Fig. 5. The time-frequency plots of the first gravitational-wave, GW150914, detected by the 
LIGO Hanford (left) and Livingston (right) observatories.°° These time-frequency plots show the 
characteristic increase in amplitude and frequency of a signal generated by two merging black 
holes. 
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3000 km separated sites. GW151226 was discovered this way.’ One purpose of these 
algorithms is to identify and localize short-lived signal candidates as soon as possi- 
ble to alert electro-magnetic telescopes for potential follow-up observations within 
minutes of the detection. Once an event has been identified, follow up studies zoom 
into the data around the event, calculate the significance of the signal, and provide 
improved estimates of the source parameters and location. False alarm rates are 
calculated by looking for false coincidences in time shifted detector data streams 
outside of the possible 10 ms window. 

Gravitational wave signals from known pulsars are expected to be highly corre- 
lated to the pulsar signal itself. This correlation allows us to integrate the signal over 
different observational periods continuously improving the sensitivity.1° However, 
a generic all-sky search for long-lasting gravitational waves from rotating neutron 
stars has to cope with a huge set of parameters to incorporate the modulation of 
the signal due to motions of the source in a potential binary system, as well as 
motions of the detectors due to the Earth’s rotation and orbit around the sun. 
The computational challenges are significant, but have lower priority than merger 
searches as no urgent optical follow-up is required.°® Similar to the Seti@Home 
project, the Einstein@home project encourages volunteers to donate computer time 
to help with this analysis.°® 

The last search we describe here is the search for stochastic gravitational waves. 
We distinguish between two types of stochastic gravitational waves. The first one 
is the relic radiation left over from the Big Bang. This is similar in nature to 
the microwave background, except that the gravitational waves would have been 
generated roughly a Planck time after the Big Bang, while the microwave back- 
ground was created about 300,000 years later. The second potential source for 
a gravitational wave background is the incoherent sum of all individual sources 
described above. The data analysis for these searches is based on cross correlating 
the data streams of the observatories, taking into account the reduction of coherence 
between separated observatories. This was one of the motivations for having two 
interferometers installed at the Hanford observatory during initial LIGO. These two 
co-located interferometers should have been ideal for this type of search; however, 
it was never possible to rule out instrumental correlations as both interferometers 
shared the same vacuum system and were connected to the same power supplies and 
electronic system. So far, most stochastic searches used the data from one of the 
two Hanford observatories and correlated it with the Livingston data. The 10 ms 
separation between the two LIGO sites limits potential correlations to frequencies 
below 50 Hz. The cross correlations between the Livingston and each of the Hanford 
interferometers provided an upper limit that excluded a few Big Bang and string 
theory models.®° Over time, the advanced detectors will significantly improve on 
these limits. The travel time of gravitational waves between Virgo and each LIGO 
detector can be up to 25 ms depending on the propagation direction. This only 
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allows stochastic signals at frequencies below 20 Hz to be correlated in the data 
streams. Initial LIGO’s sensitivity in this frequency range was not good enough to 
warrant any searches. 


3. Future Network of Ground-Based Observatories 


The two Advanced LIGO observatories were only the beginning of a future world- 
wide network of ground-based observatories. Virgo joined the two Advanced LIGO 
observatories during their second observational run. The Japanese cryogenic detec- 
tor KAGRA will be the first underground detector and will join late in this decade 
or maybe very early in the next.°° India just signed an agreement to build a site 
for a third Advanced LIGO observatory.®! Data sharing agreements between the 
LIGO (LSC) and Virgo Scientific Collaborations have existed since the first science 
runs of the initial LIGO/Virgo detectors. KAGRA is expected to join this data 
sharing agreement and LIGO-India will be an LSC observatory. These observato- 
ries are not competing, but are collaborating with each other. Such a collaboration 
is necessary because the simultaneous detection of the same signal by different 
detectors increases the confidence in each signal, improves parameter estimation for 
each source and avoids dead time if some of the detectors are offline because of 
maintenance or for other reasons. The most cited advantage of a detector network 
is the vast improvement in localizing the source with a large baseline detector net- 
work.®: 6 These additional observatories will significantly increase the chances for 
electro-magnetic follow up observations, which will also improve the science return 
of these observatories. 

The amplitude of gravitational waves scales with 1/r and Advanced LIGO is 
expected to improve in sensitivity by another factor of three before the end of this 
decade. This alone would increase the probed volume by a factor of 27, such that 
weekly detections of gravitational waves from merging black holes might even be a 
conservative estimate of the future detection rate. Furthermore, the LIGO Science 
Collaboration is working towards a few improvements beyond the final Advanced 
LIGO sensitivity. The installation of frequency-dependent squeezed light in the out- 
put port is virtually guaranteed. Further improvements will likely include heavier 
test masses and better optical coatings. All these improvements might increase the 
range by another factor of three within the currently existing facilities.°* 

However, binary systems involving 1000 solar mass black holes will emit most 
of their energy well below 10 Hz; a frequency range which will likely require new 
facilities with seismic isolation and suspension systems with resonance frequen- 
cies well below 1Hz, as well as ways to mitigate and/or minimize changes in 
the local gravitational field. The most mature concept for the next generation of 
ground-based gravitational-wave observatories is the European Einstein Tele- 
scope;!® an underground observatory with three 10km-long arms in a triangular 
configuration. The ET concept includes several parallel interferometers that can be 
tuned to specific peak frequencies using the signal recycling cavity tuning technique. 
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In the US, the 40km-long Lungo concept”? gained some traction, but the realiza- 
tion of any of these concepts is probably many years away. Concepts to measure 
gravitational waves from even more massive black holes in the sub-Hz range with 
ground-based observatories are immature at best; this frequency range will likely 
require a space-based observatory. 


4. LISA 


The first ideas for a space-based gravitational-wave observatory emerged in the 
mid to late 1970s and the first concepts were published in the 1980s in the US 
under the name LAGOS.®%:® In the early 1990s, the LISA name appeared in a 
proposal to ESA®’ while a second group in the US proposed a slightly different mis- 
sion concept, SAGITTARIUS, to NASA.®° The agencies later joined forces, merged 
both concepts, and started the first agency-led studies for the space-based observa- 
tory under the name Laser Interferometer Space Antenna (LISA). Following more 
than ten years of studies, the LISA design matured?! but the project fell victim 
to budgetary realities at both agencies; ESA needed to start a large mission and 
NASA did not have the funds to commit to LISA. Following the cancellation of 
LISA in 2011, a gravitational-wave based science theme was selected by ESA for 
their third “large” Cosmic Visions mission (L3), with a nominal launch year of 
2034. The gravitational observatory advisory team (GOAT), an ESA committee 
with US/NASA participants, reviewed the available technologies and mission con- 
cepts for L3 and concluded that a LISA-like mission concept is the most mature 
and promising approach for L3.°° Furthermore, the LISA Pathfinder mission was 
launched in December of 2015 to test the gravitational reference sensor technol- 
ogy for LISA-like missions. Pathfinder’s results surpassed even the most optimistic 
expectations.”? At the time of writing, both agencies are negotiating the terms of 
their expected partnership on L3 based on the conclusions of the GOAT, the enor- 
mous success of the LISA Pathfinder mission, and the scientific pressure following 
the Advanced LIGO detection. Consequently, it is well justified to assume that L3 
will be a LISA-like mission and this section focuses on the design and provides an 
overview of the technology of LISA, which is described in Ref. 21 in detail.° 

The characteristic strain sensitivity and the different signals LISA is expected 
to detect are shown in Fig. 6. The best known signals are inspirals and mergers of 
massive black holes, which are believed to be in the center of nearly every galaxy. 
Most signals will come from compact galactic binaries formed mostly between 
white dwarfs but also involving neutron stars and stellar mass galactic black 
holes. In fact, the number of signals will be so large that they form a stochastic 
background which is often called confusion noise in the context of LISA. Signals 


©Since writing this chapter, LISA as a project has seen tremendous programmatic progress. It is 
currently (2020) in Phase A with ongoing industrial studies aiming for mission confirmation in 
2024. 
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Fig. 6. The characteristic strain amplitude and the expected signals of a LISA-like space-based 
gravitational wave observatory. Mergers between massive black holes out to redshifts of a few will 
be detected with exceptional visibility. Many galactic binaries will be resolved. Signals from some 
extreme mass-ratio inspirals will be visible. Not shown are the <100 solar mass binaries around 
100 mHz which will be visible in LISA before they become visible in the Advanced LIGO band a 
few years later.”1 Figure adapted from S. Babak, as used in Ref. 69. See electronic edition for a 
color version of this figure. 


from extreme mass-ratio inspirals and highly redshifted 104 solar mass black hole 
mergers have the lowest expected signal-to-noise ratios of the standard signals. 
LISA’s ability to detect them will critically depend on the final mission design and 
performance. Not shown are signals from 10 to 100 solar mass binary systems at 
Gpc-type distances which will turn a few years later into signals for ground-based 
detectors.”! 


4.1. LISA Overview 


LISA consists of three spacecraft in a triangular constellation. Although LISA will 
measure gravitational waves between 10 Hz and 100mHz, it is designed to have 
the best sensitivity between 1 and 50mHz to enable the detection of extreme mass 
ratio inspirals and 10+ solar mass type binaries at high redshifts. The upper fre- 
quency sets the limit for the optimum distance between the spacecraft (L ~ \cw /2). 
Longer arms would reduce the sensitivity at all frequencies above c/2Z and would 
start to limit the lifetime of the mission. On the other end, shorter arms would 
reduce the differences in the velocity vectors of the three spacecraft which would 
reduce the amount of fuel required for the propulsion module of each spacecraft. 
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Shorter arms also decrease the differences in the gravitational forces on the 
spacecrafts. These will reduce changes in the distances and opening angles of the 
triangle, which reduces, for example, requirements on the range of the alignment 
actuators in the interferometry. These cost-saving measures might reduce the nom- 
inal distance between the spacecraft to 2.5Gm. The loss in science return of even 
shorter arms would greatly outweigh the potential savings.® 7 

The long signal periods significantly increase the susceptibility of the observato- 
ries to slowly changing environmental parameters such as temperature, quasi-static 
electric and magnetic fields, and gas pressure. The requirement on the temperature 
stability is one of the reasons for the selection of heliocentric orbits in which none 
of the spacecraft has to pass through Earth shadow during science operations. The 
LISA mission was designed not to require any station-keeping maneuvers during 
the lifetime of the mission. This design sets a lower limit for the proximity to Earth 
of the spacecraft, while the range of the deep-space network sets an upper limit. All 
these concerns and considerations influenced the selection of the original LISA orbits 
shown in Fig. 7. Each spacecraft will be placed into a slightly elliptical heliocentric 
orbit trailing Earth by 20°. These orbits still appear to be close to optimum. They 
allow a mission lifetime of up to ten years without station-keeping and provide a 
stable thermal environment. 

Similar to ground-based observatories, LISA will be limited at lower frequen- 
cies by residual accelerations of the test masses and at higher frequencies by 


Fig. 7. The LISA mission places three identical spacecraft into a Earth-like heliocentric orbit such 
that these spacecraft form a near-equilateral triangle. The constellation will trail Earth by about 
20°. Free-falling test masses inside each spacecraft are the end points of three interferometers that 
measure differential distance changes between these test masses. Credit: JPL/NASA. 
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sensing or interferometer noise. The requirements for both frequency ranges can 
be derived from the required strain sensitivity, the expected spectrum of the galac- 
tic gravitational-wave background radiation, and the nominal distance between the 
spacecraft. The expected galactic gravitational-wave background radiation gener- 
ated by hundreds of thousands of white dwarf binaries will dominate the signal 
below a few mHz and its amplitude is expected to increase towards lower frequencies 
down to +0.1 mHz (see Fig. 6). The upper frequency of the expected white-dwarf 
spectrum, as well as the expected EMRI signals, guide the selection of the corner 
frequency at which the acceleration noise fades below shot noise. A typical value 
for this corner frequency is 3mHz. At higher frequencies, the arm length starts to 
become comparable to the wavelengths of the gravitational waves and each arm 
is partly stretched and squeezed. For an otherwise optimally aligned gravitational 
wave, the response of the instrument could be described with a sinc-function, which 
has zero response at all frequencies that are non-zero multiples of the instrumen- 
tal free spectral range of c/2L. However, LISA is still sensitive to non-optimally 
aligned gravitational waves at these frequencies, albeit with lower sensitivity. The 
instrumental response is typically modeled with a fsloped loss of strain sensitivity 
above a corner frequency of around 12 mHz.?4 

These considerations can be summarized in requirements for the strain sensi- 
tivity of LISA of about: 


~ a2o 3 mHz - : 
= ae 1+( i ) + (zm) » 


requirements for the acceleration noise of a single test mass of: 


ba(f) <3x 10 Be, f <3mHz, 
Z 


and requirements for the sensing or interferometer noise of: 


~ (9.0 
dz(f) < 18x 10 Tie f > 3mHz, 
with the additional requirement that 5a(f) should not limit the measurement above 
3mHz and 6%(f) not below 3mHz.‘ 

LISA requires two key technologies, the gravitational reference sensor (GRS) to 
create free-falling test masses in space, and the interferometry measurement system 
to monitor changes in the distance between these test masses. The test masses are 
the end points of the three interferometer arms or six interferometric links of the 
interferometer. Each spacecraft will host two identical payloads, one for each arm, 
which are linked via an optical fiber but are otherwise independent. A conceptual 
design of each payload is shown in Fig. 8. The test mass on the right is freely falling 


f All of these corner frequencies have been and will probably continue to be adjusted as the mission 
design matures. 
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Fig. 8. A sketch of one of two payloads inside each of the LISA spacecrafts. The test mass (TM) 
floats inside the housing. Each housing includes two electrodes on each of the six inner surfaces to 
sense the position and orientation of the TM with respect to the housing. A laser interferometer 
measures the distance between the TM and the optical bench (OB). The optical bench also sends 
light to the beam-expanding telescope formed by the secondary and primary mirror. The same 
telescope receives the field from the far spacecraft. The received field and the local field form the 
Far interferometer. The backlink connects the two local OBs via an optical fiber, through which 
the two local laser beams are exchanged. Three beat signals are taken on each bench: A reference 
signal (Ref), the TM signal (TM) and the signal that includes the far laser field (Far). A second 
identical payload points to the third spacecraft. See electronic edition for a color version of this 
figure. 


inside the housing. The optical bench in the center routes the three laser fields 
to the test mass, the telescope, the backlink fiber and the various photodetectors 
(Far, TM, Ref).?! The telescope expands the beam and directs it towards the far 
spacecraft and it receives the beam from the far spacecraft. Not shown are the laser 
systems, the phasemeters and the «N-thrusters. 


4.2. Gravitational Reference Sensor 


The required performance of the gravitational reference sensor is several orders 
of magnitude better than the acceleration noise of earlier gravitational reference 
sensors such as the ones used in GOCE or GRACE. This huge step in performance 
was a major concern at both agencies, and ESA and NASA decided that a tech- 
nology demonstration of the GRS in a dedicated space mission would be necessary 
before LISA can become reality. The ESA-led LISA Pathfinder mission had the 
goal of demonstrating within a factor of ten the LISA requirements at frequencies 
above 1mHz.’° After the first two months of science operation, LISA Pathfinder 
exceeded these expectations by a wide margin. The measured acceleration noise is 
within a factor of 1.5 of the LISA requirements below 10mHz, and is well below 
the LISA requirements above 10mHz. Below ~ 0.5mHz the noise increases by 
another factor of two, but is still well below the LISA Pathfinder requirements."4:% 


SThis is another area in which reality surpassed the status at the time of writing this chapter. 
Following another commissioning period, the LPF GRS surpassed the LISA requirements at all 
frequencies.9? 
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Consequently, the LISA GRS will be virtually identical to the LISA Pathfinder 
GRS, with only minimal changes in peripheral components such as electronics or 
the UV-light sources that are used to discharge the test masses.”° 

In the LPF/LISA GRS each test mass is a free-falling cube that floats inside 
a cubic housing. Several factors played a key role in the selection of the test 
mass material. It should have a high density to reduce the displacement due to 
non-gravitational forces. Cosmic rays will charge the test mass and these charges 
have to easily migrate to the surface where a discharge system can minimize the 
electro-static charge on the test mass. The discharge system requires a metal so 
that all charges migrate immediately to the surface. The material should have near 
zero magnetic susceptibility to minimize the coupling to magnetic fields. The test 
mass has to have smooth surfaces to reduce patch field effects (the accumulation 
of charges near tips or edges in the surface) and to provide a high quality optical 
surface for the interferometry. A specific gold-platinum alloy with a thin gold coating 
fulfills all requirements; this was used in LPF, and will likely be used in LISA. 

Gold-coated sapphire electrodes inside a molybdenum housing measure the 
motion of the spacecraft with respect to the test mass. This capacitive sensing 
scheme has a sensitivity of about one nm/WHz. The electrodes are also used to 
apply electro-static forces and torques to the test mass. In addition, .N-thrusters 
steer the spacecraft around the test masses. Both actuations are necessary to keep 
each of the two test masses within a few nm and about 100 nrad of their nominal 
position and orientation with respect to its housing. The only degree of freedom in 
which each test mass is truly free in the LISA measurement band is the displacement 
along the optical axis of the interferometer arm. 


4.3. Interferometry Measurement System 


The interferometry measurement system (IMS) has the task to monitor changes in 
the distances between the test masses with about 10 pm/ VHz sensitivity per one- 
way measurement. One of the dominant noise sources will be the fundamental phase 
fluctuations in the field received from the far spacecraft. The minimum photon rate 
to meet the requirement is 


Ly. t A\". Bei 
Wee) 2. [|e (28) 
od Q7r2 \ J] Ss 
which is equivalent to about 100 pW of received power. The received power, Prec, 


depends on the laser power, Pout, the diameter, D = 30 cm, of the emitting and 
receiving telescopes, and their distance, D = 2.5 Gm: 


i 


= a —10 
Pree = 2222 Peat 5.7 x 10 Pout: (29) 
Power levels at the output of the emitting telescope for LISA are expected to be 


between one and two W to provide some margin. Note that the LISA telescopes are 
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fast telescopes and that their efficiency is fairly sensitive to the distance between 
the secondary and the primary mirror.®° 

The IMS uses a form of heterodyne interferometry in which the phase evolution 
of a laser beat signal in the 20 MHz range is measured with a phasemeter. The 
phasemeter is realized on a digital signal processing board. The photodetector signal 
is digitized with a 50 Ms/s rate and then multiplied in a field programmable gate 
array (FPGA) with a signal generated by a numerical controlled oscillator (NCO). 
The product is filtered and down-sampled and then used to adjust the frequency 
of the NCO to track the phase of the input signal. Changes in the phase of the 
beat signal can then be reconstructed from the NCO control signal. Multiple phase 
measurements are then combined to extract the gravitational waves."® 

Each interferometer arm is split into three parts: A local interferometer mea- 
sures the test mass-to-optical bench motion on each spacecraft, while the long 
baseline interferometer measures the motion between the 2.5-Gm-separated optical 
benches. The signals from two local interferometers and one long baseline interfer- 
ometer are then combined to calculate the motion between each pair of test masses. 
The local interferometers are simple heterodyne interferometers in which the main 
laser beam and the beam through the back-link fiber are both split into two. One 
of these beams reflects off the test mass before the beams are pairwise combined 
to form the reference (Ref) and the test mass (TM) beat signals (see Fig. 8). The 
phases of these signals depend on the optical path lengths between the various beam 
splitters as well as the differential laser frequency noise. However, the differential 
laser frequency noise is common in both signals and the phase difference between the 
two beat signals only depends on the optical path length difference, which includes 
the test mass to optical bench distance. 

The long-baseline interferometer beat signal (Far) of each arm will also be 
dominated by laser frequency noise. Time-delay interferometry takes advantage of 
the fact that the same laser frequency noise is measured multiple times on this OB, 
as well as the other OB on this spacecraft and the OB on the far spacecraft. A 
linear combination of time-shifted phasemeter data from these beat signals cancels 
the laser frequency noise but maintains the gravitational-wave signal.’” ° The effec- 
tiveness of the cancellation of the laser frequency noise depends on the knowledge of 
the required time shifts, or the knowledge of the length of each arm. For LISA, it is 
assumed that the laser frequency noise should not contribute more than 1 pm/ VHz 
to the length noise budget. A ranging system based on a pseudo-random-noise laser 
phase modulation/demodulation scheme has been developed to measure the arm 
length with a precision of 1 m.°°:8! This leads to fairly modest requirements’ on 


4 As of May 2019, the ranging requirement and the residual laser frequency noise requirement have 
both been tightened by another factor of ~10 to provide margin against potential unknown issues 
in this common mode noise rejection scheme. 
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the laser frequency noise in the LISA band of: 


7 19, 

dlpn(f) = 10 Vie 
A second method determines the correct time shifts by minimizing the rms noise 
in a signal-poor band of the LISA spectrum.”? However, after the LIGO detection, 
it is no longer obvious whether there is any signal-poor frequency range in the 
LISA spectrum. Laser frequency stabilization systems that reach the required level 
of stability, including methods which can easily be integrated into the LISA het- 
erodyne interferometry, have been demonstrated in many laboratories around the 
world.®? 

The IMS is sensitive to several noise sources that have to be addressed in the 
final mission: 


e Clock noise: The phasemeter measures the phase evolution of each laser beat 
signal with respect to the on-board ultra-stable oscillator (USO, a.k.a. clock) on 
each spacecraft. However, space-qualified USO’s are not stable enough and their 
timing noise would limit the sensitivity. The LISA team plans to modulate the 
phase of each laser field with a GHz signal that can be directly traced to the 
USO. Beat signals between these GHz sidebands allow the measurement of the 
differential timing noise and the subtraction of it from the data.” 

e Timing jitter or phase fidelity: The IMS depends on very accurate phase mea- 
surements of laser beat signals with respect to the on-board USO. Any timing 
jitter between the clock and the analog-to-digital converters that digitize the 
beat signals will be indistinguishable from length changes. Any added phase noise 
between the USO and the phase modulation sidebands applied to transfer clock 
signals will limit the ability to subtract differential clock noise from the signals.°% 

e Optical path length changes between optical components: The laser fields have to 
be routed across the optical bench and through the telescopes. These distances 
are defined by optical components that are mechanically attached to some spacer 
material like the optical bench material. Temperature changes could change the 
distances between these components. The payload of LISA has been designed to 
minimize temperature fluctuations and will be built from materials with suffi- 
ciently low coefficients of thermal expansion.** *® 

e Fiber backlink: The two local laser fields on the two local optical benches are 
exchanged via a single optical fiber. The suitability of this exchange depends on 
the near perfect symmetry of the optical path length noise within the fiber for 
both laser fields propagating in opposite directions through the fiber.°” 

e Scattered light: The measurement principle is based on the idea that identical 
copies of wave trains propagate through the interferometer and are combined 
and measured at various locations inside the interferometer. Scattered light is 
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light that scatters out of the laser beams, reflects from some surface, and then 
reenters the laser beam and changes the phase of one of these nominally identical 
copies. The noise caused by scattered light is proportional to the amplitude, but 
also to the motion of the scattering surfaces; a constant phase change would not 
be a problem. Known sources of significant amounts of scattered light are the 
telescope®®:®° and the backlink fiber.®7 


Each of these noise sources is expected to contribute not more than 1 pm/ VHz in 
effective length noise to the observatory. Prototype experiments have shown that 
technical solutions that meet the requirements exist for most of the technologies; all 
others are within factors of a few of the LISA requirements. A summary of the status 
of several key technologies for the interferometer measurement system is available 
in Ref. 90. The Pathfinder results and also results from various LISA testbeds show 
that pm-interferometry in the LISA band is now state-of-the-art and no longer a 
dream of tomorrow. 


5. Summary 


Gravitational waves entered astronomy with a bang. The first three detections 
(assuming that the candidate event LVT151012 is in fact a detection) originated 
from mergers between three pairs of black holes over a billion light years away, in 
stark contrast to the stellar mass black holes observed exclusively in our galaxy. 
Furthermore, some of the merging black holes and all of the final black holes are 
larger than the roughly solar-mass black holes observed before. It is expected that 
Advanced LIGO, together with its partners Virgo, KAGRA, and eventually LIGO- 
India, will observe gravitational waves from similar and even heavier black hole 
mergers on a weekly bases. These detectors will also observe gravitational waves 
from a neutron star merger before this decade is over. Supernova explosions are 
another likely source, although the range is probably limited to supernovae in our 
local group. Future plans like ET or Lungo will further improve the sensitivity and 
allow very sensitive studies of merger waveforms as well as the detection of signals 
at redshifts of order unity. The LISA observatory will open another window to the 
universe and will observe gravitational waves from sources as heavy as 10° to 108 
solar masses at redshifts of a few, from 104 solar mass black holes at redshifts of 
order ten and from binaries in the ten to hundred solar mass range a few years 
before they merge and emit in the frequency range of ground-based observatories. 
The combination of all these observations, ground and space, will allow us to mea- 
sure the mass distribution of black holes over seven orders of magnitude in size and 
throughout most of the universe. LISA will also measure gravitational waves from 
hundreds of thousands of galactic binaries and will search for a stochastic back- 
ground of gravitational radiation left over from the big bang. The era of listening 
to the universe has begun. 


210 G. Mueller 


References 


1. A. Einstein, Preussische Akad. Wissen. Sitzungsberichte 1, 315 (1915). 

2. A. Einstein, in Sitzungsberichte der Kéniglich Preussischen Akademie der Wis- 
senschaften, Berlin, Germany (1916), p. 688. 

3. A. Einstein, in Sitzungsberichte der Kéniglich Preussischen Akademie der Wis- 
senschaften Berlin, Germany (1918), p. 154. 

4. W. Steinicke, Astron. Nachr. 326 (2005); Short Contributions AG Kéln, Germany 
(2005). 

5. K. Riles, Particle Nucl. Phys. 68, 1 (2013); arXiv:1209.0667. 

6. B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), Phys. 
Rev. Lett. 116, 061102 (2016). 

7. B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), Phys. 
Rev. Lett. 116, 241103 (2016). 

8. B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), Phys. 
Rev. X 6, 041015 (2016). 

9. P. Fritschel et al. (LIGO Scientific Collaboration), arXiv:1411.4547. 

10. A. Taracchini, A. Buonanno, Y. Pan et al., Phys. Rev. D 89, 061502(R) (2014). 

11. S. Husa, S. Khan, M. Hannam et al., arXiv:1508.07250v2. 

12. P. Jaranowski, A. Krélak and B. F. Schutz, Phys. Rev. D 58, 063001 (1998). 

13. E. Berti, V. Cardoso and A. O. Starinets, Class. Quantum. Grav. 26, 163001 (2009). 

14. V. Cardoso, E. Franzin and P. Pani, Phys. Rev. Lett. 116, 171101 (2016). 

15. B. Abbott (LIGO Scientific Collaboration), Astrophys. J. Lett. 683, L45 (2008). 

16. J. Aasi et al. (LIGO Scientific Collaboration and Virgo Collaboration), Phys. Rev. D 
91, 022004 (2015). 

17. C. D. Ott, Class. Quantum Grav. 26, 063001 (2009). 

18. K. Kotake, arXiv:1110.5107. 

19. M. Punturo et al., Class. Quantum Grav. 27, 194002 (2010). 

20. S. Dwyer, D. Sigg, S. W. Ballmer et al., Phys. Rev. D 91, 082001 (2015). 

21. K. Danzmann et al. (LISA International Science Team), LISA Assessment Study 
Report (Yellow Book), ESA/SRE(2011)3 (2011). 

22. H. Bondi, F. A. E. Pirani and I. Robinson, Proc. Roy. Soc. A 251, 519 (1959). 

23. T. H. Maiman, Nature 187, 493 (1960). 

24. M. E. Gertsenshtein and V. I. Pustovoit, J. Exp. Theor. Phys. 43, 605 (1962). 

25. R. Weiss, Electromagnetically coupled broadband gravitational wave antenna, Quart. 
Prog. Rep. Res. Lab. Electro. (MIT) 105, 84 (1972). 

26. R. E. Vogt, R. W. P. Drever, K. S. Thorne et al., Proposal to the NSF for initial LIGO, 
https: //dcc.ligo.org/LIGO-M890001/public (1993). 

27. H. Liick and H. Grote, GEO600 Advanced Gravitational Wave Detector (Cambridge 
University Press, 2012), p. 155. 

28. F. Acernese et al., Class. Quantum Grav. 32, 024001 (2015); arXiv:1408.3978. 

29. M. Ando et al., Phys. Rev. Lett. 86, 3950 (2001). 

30. Y. Aso et al., Phys. Rev. D 88, 043007 (2013). 

31. A. E. Siegman, Laser, University Science Books (1986). 

32. R. W. P. Drever et al., in P. Meystre and M. O. Scully (eds). Proc. NATO Advanced 
Study Institute on Quantum Optics and Experimental General Relativity, Plenum 
Press, New York, (1983), p. 503. 

33. B. J. Meers, Phys. Rev. D 38, 2317 (1988). 

34. J. Mizuno et al., Phys. Lett. A 175, 273 (1993). 

35. P. Kwee et al., Optics Express 20, 10617 (2012). 


Laser Interferometric Gravitational Wave Observatories 211 


36. C. Mueller et al., The advanced LIGO input optics, Rev. Sci. Instrum. 87, 014502. 


37. 
38. 
39. 
40. 
41. 
42. 
43. 
44. 
45. 
46. 
47. 
48. 
49. 
50. 
51. 
52. 
53. 
54. 
55. 


56. 


57. 
58. 


59. 


60. 


61. 
62. 


63. 
64. 


65. 
66. 


67. 


68. 


69. 


70. 


71. 
72. 


73 
74 


(2016). 

M. A. Arain and G. Mueller, Optics Express 16, 10018 (2008). 

K. Strain et al. Appl. Optics 42, 1244 (2003). 

G. Miiller, T. Delker, D. B. Tanner et al., Appl. Optics 42(7), 1257 (2003). 

T. Fricke et al., Class. Quantum Grav. 29, 065005 (2012). 

L. Barsotti, M. Evans and P. Fritschel, Class. Quantum Grav. 27, 084026 (2010). 

A. Staley et al., Class. Quantum Grav. 31, 245010 (2014). 

G. Mueller, Optics Express 13, 7118 (2005). 

N. Mavalvala, D. Sigg and D. Shoemaker, Appl. Optics 37, 7743 (1998). 

D. V. Martynov et al., Phys. Rev. D 93, 112004 (2016). 

F. Matichard et al, Class. Quantum Grav. 32, 185003 (2015). 

S. Aston et al. Class. Quantum Grav. 29, 235004 (2012). 

F. Acernese et al., Optics Lasers Eng. 45, 478 (2007). 

J. Harms, Living Rev. Relativity 18, 3 (2015). 

C. M. Caves, Phys. Rev. D 23, 1963 (1981). 

J. Abadie et al. (LIGO Scientific Collaboration), Nature Phys. 7, 962 (2011). 

H. B. Callen and R. F. Greene, Phys. Rev. 86, 702 (1952). 

G. Harry, Appl. Optics 45, 1569 (2006). 

S. Gras, H. Yu, W. Yam et al., arXiv:1609.05595 

B. P. Abbott et al. (LIGO Scientific Collaboration and the Virgo Collaboration), Class. 
Quantum Grav. 33, 134001 (2016). 

S. Klimenko, I. Yakushin, A. Mercer et al., Class. Quantum Grav. 25, 114029 (2008). 
C. Messick et al., arXiv:1604.04324. 

K. Riles (LIGO Scientific Collaboration), in J. van Leeuwen, (ed.), Neutron Stars and 
Pulsars: Challenges and Opportunities After 80 Years, Proceedings [AU Symposium 
Vol. 291, (2012). 

B. Abbott et al. (LIGO Scientific Collaboration), Phys. Rev. D 80, 042003 (2013). 
B. Abbott et al. (LIGO Scientific Collaboration and The Virgo Collaboration), Nature 
460, 990 (2009). 

LIGO-India, http://gw-indigo.org/tiki-index.php?page=LIGO- India 

H. Tagoshi, C. K. Mishra, A. Pai et al., Phys. Rev. D 90, 024053 (2014); 
arXiv:1403.6915v2. 

Q. Chu, E. J. Howell, A. Rowlinson et al., arXiv:1509.06876. 

R. Adhikari, K. Arai, S. Ballmer et al., Report of the 3rd Generation LIGO Detector 
Strawman Workshop, https://dcc.ligo.org/LIGO-T1200031. 

J. E. Faller and P. L. Bender, NBS Special Publ. 617, 689 (1984). 

J. E. Faller et al., in Proc. Colloquium on Kilometric Optical Arrays in Space, ESA 
report SP-226 (1985). 

Kk. Danzmann, A. Riidiger, R. Schilling et al., Proposal for a Laser-Interferometric 
Gravitational Wave Detector in Space, MPQ-Reports 177 (1993). 

R. W. Hellings, SAGITTARIUS: A Space Gravitational Wave Mission, NASA Tech- 
nical Report, Document ID: 20060038149. 

M. Perryman et al., The ESA — L3 Gravitational Wave Mission, Gravitational Obser- 
vatory Advisory Team Final Report (2016). 

M. Armano et al., Phys. Rev. Lett. 116, 231101 (2016). 

A. Sesana, Phys. Rev. Lett. 116, 231102 (2016). 

Gravitational-Wave Mission Concept Study Final Report, http://pcos.gsfc.nasa.gov / 
physpag/GW_Study_Rev3_Aug2012-Final.pdf 

. M. Armano et al., J. Phys.: Conf. Ser. 610, 012005 (2015). 

. T. Ziegler, P. Bergner, G. Hechenblaikner et al., arXiv: 1207.0394v2. 


212 G. Mueller 


75 
76 


Mae 
78. 
79. 
80. 


81. 
82. 
83. 
84. 


85. 
86. 
87. 
88. 
89. 
90. 
91. 


92. 


93 


. T. Olatunde, R. Shelley, A. Chilton et al., J. Phys. Conf. Ser. 610, 012034 (2015). 

. D. Shaddock, B. Ware, P. G. Halverson et al., AIP Conf. Proc. 873, 654 (2006); doi: 
10.1063/1.2405113. 

D. A. Shaddock, M. Tinto, F. B. Estabrook et al., Phys. Rev. D 68, 061303(R) (2003). 
G. de Vine, B. Ware, K. McKenzie et al., Phys. Rev. Lett. 104, 211103 (2010). 

S. J. Mitryk, G. Mueller and J. Sanjuan, Phys. Rev. D 86, 122006 (2012). 

J. J. Esteban, I. Bykov, A. F. Garcia Marin et al., J. Phys. Conf. Ser. 154, 012025 
(2009). 

A. Sutton, K. McKenzie, B. Ware et al., Optics Express 18, 20759 (2010). 

J. Eichholz, D. B. Tanner and G. Mueller, Phys. Rev. D 92, 022004 (2015). 

D. Sweeney and G. Mueller, Optics Express 20, 25603 (2012). 

J. Sanjuan, D. Korytov, G. Mueller et al., Rev. Sci. Instrum. 83, 116107 (2012); doi: 
10.1063/1.4767247. 

J. Sanjudn, A. Preston, D. Korytov et al., Rev. Sct. Instrum. 82, 124501 (2011). 

D. I. Robertson, E. D. Fitzsimons et al., Class. Quantum Grav. 30, 085006 (2013). 
R. Fleddermann, F. Steier, J. Bogenstahl et al., J. Phys.: Conf. Ser. 154, 012022 
(2009). 

A. Spector and G. Mueller, Class. Quantum Grav. 29, 205005 (2012). 

J. Livas and S. Sankar, J. Phys.: Conf. Ser. 610, 012029 (2015). 

S. Barke, Inter-Spacecraft Frequency Distribution for Future Gravitational Wave 
Observatories, Dissertation, Universitat Hannover (2015). 

B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), 
arXiv:1811.12907v2 (2019). 

B. P. Abbott et al. (LIGO Scientific Collaboration and Virgo Collaboration), Phys. 
Rev. Lett. 119, 161101 (2017). 

. M. Armano et al., Phys. Rev. Lett. 120, 061101 (2018). 


Index 


absorption coefficient, 10 
absorptivity, 170 

acceleration noise, 204 

accidental coincidences, 66, 67 
Advanced LIGO, 194, 195 

air showers, 118, 120 

AMANDA, 177 

angular resolution, 145, 154, 156 
ANTARES, 167, 168, 171 

aplanatic two-mirror telescopes, 124 
Approximation B, 141 

Atmospheric Cherenkov Telescopes, 138 
avalanche photodiode detectors, 127 


background, 131, 132, 145, 153, 154, 164, 
172, 173, 177 

background rejection, 92, 164, 172 

background subtraction, 112, 113 

Baikal, 177 

binary source, 187 

Bragg angle, 4-7, 9, 12, 15 

Bragg condition, 5 

Bragg configuration, 5 

Bragg diffraction, 3, 5 

Bragg equation, 4 

Bragg geometry, 5 

Bragg law, 10 

bremsstrahlung, 119 

broad passband, 15 


calorimeter, 97, 98 

calorimeter cluster, 97, 98 

charged current interaction, 166, 171, 174 
charged particle shield (CPS), 62, 65 
Cherenkov angle, 121 


Cherenkov cone, 165 

Cherenkov detector calibration, 169-171 

Cherenkov detectors, 177 

Cherenkov light, 168 

Cherenkov photon yield, 121 

Cherenkov radiation, 120, 165, 166, 168, 
169 

Cherenkov Telescope Array, 125, 134 

CLAIRE, 16, 17 

clock noise, 208 

clustering algorithms, 98 

clustering stage, 98 

coating thermal noise, 197, 198 

coded apertures, 2 

coded mask, 2 

Coherent WaveBurst, 198 

compact binary systems, 185 

COMPTEL, 52, 53, 69, 70, 72 

charged particle shield (CPS), 58 
sensitivity, 58, 59 

Compton formula, 53 

Compton process, 52 

Compton scattering, 29, 52 

Compton telescope, 54 

confusion noise, 201 

core collapse supernovae, 174 

CORSIKA, 142-148, 155, 156, 159 

cosmic diffuse emission, 67, 69 

cosmic rays, 119, 120, 128-131, 138, 139, 
145, 146, 153, 157, 176 

cosmic-ray background rejection, 146, 
148-150, 153, 154, 157 

cross sections, 168 

crystal passband, 10 

crystal thickness, 14, 15 


214 


crystallites, 8-10, 14 
curvature of the shower front, 155 
curved diffractive planes, 11 


decay time, 28, 32, 37 
diffraction efficiency, 12, 23 
diffractive optics, 3 

digital signal processing, 171 
digitization, 96 

direct-view telescopes, 1-3 
direction of motion (DOM), 62 
discrimination, 130 
displacement noise, 194 
DUMAND, 177 


effective area, 101, 103, 132, 133, 153, 154, 
157, 159 

efficiency, 13, 14 

EGRET, 81-83, 85, 90, 91 

electro-optical modulator, 191 

electromagnetic air showers, 158 

electron tracking, 61, 66, 69 

end test mass, 190, 191, 197 

energy dispersion, 101 

energy distribution of muons, 147, 148 

energy distributions of electrons and 
gamma rays, 143, 144 

energy loss spectrum, 43, 44 

energy resolution, 104, 105, 144, 171 

event analysis, 96 

event classification, 96 

event conformity, 66, 71 

event reconstruction, 96 

extensive air shower (EAS) arrays, 138 

external curvature, 11-13 


Faraday isolator, 191, 193 
Fermi-LAT, 81, 82, 84-86, 89, 91, 92 
fiber backlink, 208 

field of view, 92, 123 

first interaction depths, 144, 145 
focal length, 6 

focusing instruments, 3 

focusing optics, 2, 3 

focusing telescopes, 3 

Frank—Tamm formula, 151 

full width half maximum (FWHM), 10, 14 


galactic binaries, 202 
gamma ray, 121, 131, 132 


Index 


gamma-ray background, 87, 90 
gamma-ray discrimination, 122 
gamma-ray reconstruction, 122, 123 
gamma-ray showers, 155, 156 
gravitational reference sensor, 204 
gravitational wave detector 
alignment, 193 
length and alignment sensing 
systems, 193 
length sensing and control system, 
193 
Newtonian noise, 196 
quantum noise, 196 
shot noise, 196 
stability, 193 
thermal noise, 196, 197 
gravitational wave polarizations, 184, 186 


H.E.S.8., 119, 124-126 

hadronic air showers, 148 
HAWC, 149, 150, 156-159 
HAXTEL, 18, 19 

High Energy Starting Event, 177 
Homestake, 163, 165, 174 


IceCube, 163, 167-171 

inorganic scintillation, 48 

inorganic scintillator, 32, 33, 36, 37, 40, 
45, 46 

input mode cleaner, 191, 193 

input test mass, 190, 191 

instrument response functions, 101 

integrated reflectivity, 10 

interferometer noise, 204 

interferometry measurement system, 204 


KAGRA, 189, 200 
Kamiokande, 173-175 


laser frequency noise, 208 

laser frequency stabilization system, 191 
laser noise, 190 

LAT, 89, 92 

lateral distribution, 142 

lateral distribution of muons, 147 
lateral distribution of the energy flow, 155 
Laue configuration, 5 

Laue geometry, 5 

Laue lens, 3, 4 

LAUE project, 20, 21 


leakage, 88 


length and alignment sensing systems, 193 


light collection efficiency, 46 

light output, 27, 28, 30, 33-35, 37 
light yield, 28, 30, 34-37, 39, 46, 47 
LISA, 201, 202 

longitudinal development, 140, 141 


MAGIC, 124-126 

MAGIC II, 119 

mass quadrupole moment, 185-187 
massive black holes, 201, 202 
matched filters, 198 

MAX, 17 

mean-scaled width, 130, 131 
micro-blocks, 8-10 

Milagro, 148, 149, 153, 156-159 
mirror facets, 124, 125 

Moliere radius, 142 

moment analysis, 130 

momentum vector, 61, 62, 67 
mosaic crystal, 8-11, 12-14, 16 
mosaic spread, 10-12 

multiple scattering angle, 83, 84, 92 


narrow passband, 16 

narrowband, 18 

neutral current interaction, 166, 174 
neutrino interaction, 164, 165, 171, 172 
neutrino oscillations, 173, 174, 177 
NKG function, 142 

noise curve, 185 

nuclear recoil, 80 


observing profile, 102, 103 

optical aberration, 124 

organic scintillators, 32, 33, 40, 45 
output mode cleaner, 191, 193 


pair production, 30, 119 
particle arrival times, 155, 156 
passband, 10, 11 

peak reflectivity, 14 
photoelectric absorption, 29 
photofraction, 44, 46 


photomultiplier tube (PMT), 41, 126, 171 


photopeak, 44 


point spread function (PSF), 68, 83, 101, 


105 
power recycling cavity, 191, 192 


power recycling mirror, 191, 192 

primary extinction, 9 

proton-induced extensive air showers, 
147 

pulse shape discrimination, 32, 34 


QED opening angle, 78-80 


radiation length, 78, 79, 119 
reconstructed tracks, 97 
ring-down, 187 

rotating neutron stars, 187 


scattering length, 170 

scintillator efficiency, 30, 35, 38 

secondary curvature, 11, 13 

secondary extinction, 9, 10 

secondary extinction coefficient, 10 

self-veto, 90 

shower age, 142 

shower development, 119, 133 

shower fluctuations, 144 

shower maximum, 88, 89, 139 

shower reconstruction, 122 

shower-fitting algorithm, 98 

signal recycling mirror, 191, 192 

Silicon Laue Components, 12 

silicon strip detectors, 84, 86, 91 

SN1987A, 165 

solar neutrinos, 173-175 

spectral noise density, 185, 196 

squeezed light, 197 

stereoscopic imaging technique, 122, 
123 

stochastic gravitational waves, 199 

strain sensitivity, 201, 204 

Sudbury Neutrino Observatory (SNO), 
165, 168, 170, 173, 174 

super-attenuator, 195 

Super-Kamiokande, 168, 170, 173-175 

Supernova Early Warning System, 175 

supernovae, 188 

suspension thermal noise, 195 


tessellated reflector, 123, 124 

test mass, 204, 206 

TeV gamma-ray emission, 118 

time projection chambers, 92 
time-of-flight (ToF), 60, 61, 63, 66, 71 
timing jitter, 208 


216 


track parameters, 99 
track-finding algorithms, 98 
track-fitting algorithms, 98 
trigger systems, 127 


US Standard Atmosphere, 139, 140 


Index 


VERITAS, 118, 119, 125, 128 
very-high-energy (VHE) gamma rays, 117 
veto system, 90, 91, 93 

Virgo, 189, 200 


wavefront sampling, 122 


